[Qt-interest] Messed-up encoding to UTF-8
Thiago Macieira
thiago.macieira at trolltech.com
Sat Jul 18 18:22:54 CEST 2009
Jeffrey Brendecke wrote:
>I am running into a problem that appears to only be on Linux with Qt
> 4.5.1 using g++ 4.2.4.
>
>The corresponding Windows version of Qt with the same code compiled on
>msvc2005 does not seem to have this problem.
>
>It appears that when trying to convert a QString to a utf-8-encoded
>QByteArray, the bytes with the high-order bit set are being
> over-encoded. I am seeing unicode characters that would normally result
> in two-bytes in UTF-8 being turned into 4-bytes.
>
>Take, for example, the wide character 'Ä':
>
>Normally, it would be encoded from 16-bit Unicode to utf-8 as:
>
>\xc3\x84
>
>Instead, I am seeing the following bytes being generated:
>\xc3\x83\xc2\x84
>
>Whether I call the QString::toUtf8() method or do something like this:
>
>QTextCodec* codec = QTextCodec::codecForName( "UTF-8" );
>inBuf = codec->fromUnicode( str );
>
>... the result is the same.
>
>This used to work on the Linux computer with the same Qt binaries and
> somehow stopped.
>
>It seems that there is some static instance that gets messed up and the
>problem perpetuates.
>
>Would anyone have an idea what could be going on here?
There's nothing wrong with the Qt encoding.
Your source is already encoded UTF-8 and you decoded it as Latin1 when
reading into QString. So when you call toUtf8(), it doubly encodes.
Please check your source data.
--
Thiago Macieira - thiago.macieira (AT) nokia.com
Senior Product Manager - Nokia, Qt Software
Sandakerveien 116, NO-0402 Oslo, Norway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20090718/49bfbb53/attachment.bin
More information about the Qt-interest-old
mailing list