[Qt-interest] Messed-up encoding to UTF-8

Sat Jul 18 17:42:47 CEST 2009

I am running into a problem that appears to only be on Linux with Qt 4.5.1 
using g++ 4.2.4.

The corresponding Windows version of Qt with the same code compiled on 
msvc2005 does not seem to have this problem.

It appears that when trying to convert a QString to a utf-8-encoded 
QByteArray, the bytes with the high-order bit set are being over-encoded. I 
am seeing unicode characters that would normally result in two-bytes in UTF-8 
being turned into 4-bytes.

Take, for example, the wide character 'Ä':

Normally, it would be encoded from 16-bit Unicode to utf-8 as:

\xc3\x84

Instead, I am seeing the following bytes being generated:
\xc3\x83\xc2\x84

Whether I call the QString::toUtf8() method or do something like this:

QTextCodec* codec = QTextCodec::codecForName( "UTF-8" );
inBuf = codec->fromUnicode( str );

... the result is the same.

This used to work on the Linux computer with the same Qt binaries and somehow 
stopped.

It seems that there is some static instance that gets messed up and the 
problem perpetuates.

Would anyone have an idea what could be going on here?