[Development] Why can't QString use UTF-8 internally?
Bo Thorsen
bo at vikingsoft.eu
Wed Feb 11 10:32:31 CET 2015
Den 10-02-2015 kl. 23:17 skrev Allan Sandfeld Jensen:
> On Tuesday 10 February 2015, Oswald Buddenhagen wrote:
>> On Wed, Feb 11, 2015 at 12:37:41AM +0400, Konstantin Ritt wrote:
>>> Yes, that would be an ideal solution. Unfortunately, that would also
>>> break a LOT of existing code.
>>
>> i was thinking of making it explicit with a smooth migration path - add
>> QUtf8String (basically QByteArray, but don't permit implicit conversion
>> to avoid encoding mistakes) and QUcs4String (and QUtf16String as an
>> alias for current QString - for all the windows function calls). the
>> main effort would be adding respective overloads to all our api. then
>> deprecate QString, and prune it in qt6. then maybe re-add it as an
>> alias for utf8string a few minor versions down. does that sound
>> feasible?
>>
> Maybe with C++11 we don't need QString that much anymore. Use std::string with
> UTF8 and std::u32string for UCS4.
This would make me very unhappy. I'm doing a customer project right now
that uses std::string all over the place and there is real pain involved
in this. It's an almost empty layer over char* and brings none of the
features of QString. Of all the failures of the C++ standards committee,
std::string is the worst.
Any string class has to be unicode. What it uses internally is an
implementation detail (which is what started this thread). It's fine to
have a pure ascii string type as well, but there are so few cases left
in real world applications where this is useful.
What QString internally uses is a pure optimization question, and I'll
leave that to others. But whatever is decided, I want to be sure it
keeps some of the things QString offers:
1) Unicode! Don't assume the user remembers to use utf8.
qlabel->setText(stdString) *will* fail. Leaving decisions on encoding to
users is a bad idea.
2) length() returns the number of chars I see on the screen, not a
random implementation detail of the chosen encoding.
3) at(int) and [] gives the unicode char, not a random encoding char.
std::string fails at those completely basic requirement, which is why
you will never see me use it, unless some customer API demands it or I'm
in one of those exceptional cases where there is sure to be ascii only
in the strings.
Another note: Latin1 is the worst idea for i18n ever invented, and it's
by now useless, irrelevant and only a source for bugs once you start to
truly support i18n outside of USA and Western Europe. I would be one
step closer to total happiness if C++17 and Qt7 makes this "encoding"
completely unsupported.
I know this I've made the statements here a bit harsh, but I see the
same kinds of problems again and again in customer code, when they chose
to use std::string all over the place. They give the same arguments I've
seen here - optimized, faster, etc - and add a few like "easier to
switch away from Qt, backend is std/boost only and no Qt allowed and so
on". And they pay for it in development time, bugfixing and angry users.
Sure, QString isn't optimized for some cases. But I'll take a less
optimized class any day over something that brings heaps of bugs. Then I
have time to focus on optimizing the serious things instead of fixing bugs.
Bo Thorsen,
Director, Viking Software.
--
Viking Software
Qt and C++ developers for hire
http://www.vikingsoft.eu
More information about the Development
mailing list