[Development] Why can't QString use UTF-8 internally?

Marc Mutz marc.mutz at kdab.com
Wed Feb 11 10:11:33 CET 2015


On Wednesday 11 February 2015 02:22:45 Thiago Macieira wrote:
> > charT do_toupper(charT c) const;
> > const charT* do_toupper(charT* low, const charT* high) const;
> >
> > 
> >
> > Effects: Converts a character or characters to upper case. The second
> > form replaces each character *p in the range [low,high) for which a
> > corresponding upper-case character exists, with that character.
> >
> > 
> >
> > Returns: The first form returns the corresponding upper-case character if
> > it is known to exist, or its argument if not. The second form returns
> > high.
> 
> The above does not deal with string expansion due to uppercasing (the
> famous  "ß" to "SS" case). The function is flawed by design.

You overlooked "where a corresponding character exists". Either uppercase ß 
exists (it does, it was found in an old printing, so there's a movement to 
adopt it, except Unicode doesn't have it), then it's not a problem, or it does 
(as is the case in Unicode), and the character stays lower-case.

The problem might exist in upper-casing i in Turkish at utf-8, though (because i 
is US-ASCII, but toUpper(i) isn't).

> Qt has done this since Qt 2.0 (June 1999), so we're at 15 years ahead and
> counting. I would simply not trust something close to two decades behind us
> to do something they haven't begun to implement yet.

I agree 100% that Qt is light-years ahead of the standard when it comes to all 
things i18n.

But...

You can turn this argument 180° and it explains the situation with container 
classes. For more than 15 years, we have a vector that can use custom 
allocators and can iterate backwards symmetrically. Yes, vector 
implementations were lacking on existing platforms as little as 8 years ago, 
but what if all the man-power that has gone into making sub-standard container 
classes for Qt had instead gone into contributing to std implementations?

What does this mean? It simply means: Pick your battles.

There's zero point in trying to re-invent shared_ptr or anything from the STL 
(except fixing max(), but how do you explain the difference to std::max(), 
then?). But std::locale, as is universally agreed, not just in the Qt world, 
is deeply flawed. Here, Qt can contribute. By implementing a better alternative 
(not that I think current QLocale is _that_ great), but also by helping to 
move the standard forward.

We're seeing how we stall development when things that aren't at the core of 
Qt's offering are good enough for Qt. Better invest the little time that is 
spent the Qt containers these days on improving the STL implementations where 
we find them lacking.

That reminds me to start looking into why std::vector::emplace_back expands to 
1K of text size more than push_back(T&&) on GCC...

Thanks,
Marc

-- 
Marc Mutz <marc.mutz at kdab.com> | Senior Software Engineer
KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company
www.kdab.com || Germany +49-30-521325470 || Sweden (HQ) +46-563-540090
KDAB - Qt Experts - Platform-Independent Software Solutions



More information about the Development mailing list