[Qt-interest] Messed-up encoding to UTF-8
Jeffrey Brendecke
jwbrendecke at icanetix.com
Sat Jul 18 17:42:47 CEST 2009
I am running into a problem that appears to only be on Linux with Qt 4.5.1
using g++ 4.2.4.
The corresponding Windows version of Qt with the same code compiled on
msvc2005 does not seem to have this problem.
It appears that when trying to convert a QString to a utf-8-encoded
QByteArray, the bytes with the high-order bit set are being over-encoded. I
am seeing unicode characters that would normally result in two-bytes in UTF-8
being turned into 4-bytes.
Take, for example, the wide character 'Ä':
Normally, it would be encoded from 16-bit Unicode to utf-8 as:
\xc3\x84
Instead, I am seeing the following bytes being generated:
\xc3\x83\xc2\x84
Whether I call the QString::toUtf8() method or do something like this:
QTextCodec* codec = QTextCodec::codecForName( "UTF-8" );
inBuf = codec->fromUnicode( str );
... the result is the same.
This used to work on the Linux computer with the same Qt binaries and somehow
stopped.
It seems that there is some static instance that gets messed up and the
problem perpetuates.
Would anyone have an idea what could be going on here?
More information about the Qt-interest-old
mailing list