[Development] Status on QString's UTF-8 changes
Thiago Macieira
thiago.macieira at intel.com
Wed May 2 16:26:42 CEST 2012
Context: if you're reading this out of context, here's the context: we're
changing the default encoding for QString's methods that deal with 8-bit data.
In Qt 3 and 4, it was a variable encoding and defaulted to Latin 1 (set with
QTextCodec::setCodecsForCStrings). In Qt 5, the variability was removed,
leaving fromAscii == fromLatin1. We're NOT changing how QString internally
stores data, that will remain UTF-16.
A number of commits have been accepted into Qt 5 that dealt with the encoding
of source files. I think I caught all source code that contained non-7-bit
characters and reencoded them to UTF-8. There are surprisingly few in Qt. I've
also wrapped all uses of the "ascii" functions that contained Latin1 data with
QString::fromLatin1.
The following two pending commit changes the QString 8-bit functions to use
UTF-8, by *temporarily* changing fromAscii to mean fromUtf8, and toAscii to
mean toUtf8.
https://codereview.qt-project.org/24700
https://codereview.qt-project.org/24701
tests: https://codereview.qt-project.org/24702
They have been tested in qtbase and no regressions have been found. I do not
believe they should cause regressions in other modules.
I'm now testing a series of changes that change fromAscii to fromUtf8, as well
as correct one or two encoding mistakes I think I've found. Since fromAscii ==
fromUtf8 at this point in the test, the change is technically a no-op and I
expect no regressions at all. Those changes are done for the few places in the
code where the data seemed to be non-Latin1 in origin, as well as QString
itself.
Next, I'll change all remaining fromAscii to fromLatin1 and toAscii to
toLatin1. Since that's what those functions were before (still are right now
in qtbase master), I also expect no regressions. Then I'll deprecate the Ascii
functions.
Finally, probably starting two weeks from now when I'm back from the US, I'll
start benchmarking and optimising the fromUtf8 function, as well as merging
the many UTF-8 encoders and decoders in Qt (yes, we have more than one).
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Intel Sweden AB - Registration Number: 556189-6027
Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.qt-project.org/pipermail/development/attachments/20120502/f07e29a8/attachment.sig>
More information about the Development
mailing list