[Qt-interest] QString::toLower() issues with two-byte chars

Thiago Macieira thiago at kde.org
Sat Mar 13 00:27:03 CET 2010


Em Sexta-feira 12. Março 2010, às 15.08.14, Øyvind Vågen Jægtnes escreveu:
> On Fri, Mar 12, 2010, Thiago Macieira wrote:
> > Em Sexta-feira 12. Março 2010, às 03.50.28, Øyvind Vågen Jægtnes escreveu:
> > > Thank you, they both worked like a charm!
> > > I forget that not everyone are sitting at a UTF-8 enabled terminal ;)
> > > 
> > > But this brings up another issue that might come down the road. What
> > > happens if someone runs this program in a latin1 terminal and inputs
> > > the same types of chars? Is there a way to detect the encoding of the
> > > terminal one is running at?
> > 
> > Your issue has nothing to do with the terminal/locale encoding. It's the
> > source file encoding. Qt has separate settings for each of those two.
> > 
> > The locale encoding is QTextCodec::codecForLocale and is automatically
> > detected. Whenever you use QString::fromLocal8Bit and
> > QString::toLocal8bit, you're going through the locale mapper. Whenever
> > you read data from "the terminal", like you said, you have to go through
> > this mapper too, to obtain proper Unicode strings.
> > 
> > The source file encoding is QTextCodec::codecForCStrings and is not
> > automatically detected. It defaults to Latin 1. QString::fromAscii,
> > QString::toAscii and all the QString conversions to and from const char*
> > and QByteArray use this encoding. And that was your problem: your source
> > code was UTF-8, but QString interpreted your byte array as Latin 1. So
> > instead of reading "æøå", the string was actually "æøå".
> 
> Yes I have been reading up a bit on this now :)
> I recognized the UTF-8 chars as they look in ascii, but I didn't find
> the QTextCodec docs. It is quite obvious now that I have read them.
> 
> The question of how to detect the locale in the future was more in
> regards to when I accept data from the network into this application.
> I think I will use QTextCodec::codecForHtml() if this becomes an
> issue. (Most people use UNICODE now a days, don't they?)
> 
> Thank you for the explanation though, I feel that I have better
> understanding of the whole thing now.

You cannot detect the encoding of the source file. The problem is that the 
feature applies to ALL strings in the program, regardless of which source file 
or library it's coming from.

So my recommendation is that you leave it as it is. Do not change it to UTF-8. 
So do not write any non-ASCII strings that use the automatic QString 
conversions.

Instead, write QString::fromUtf8 when you need a string that is not ASCII.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Senior Product Manager - Nokia, Qt Development Frameworks
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20100312/281a46b4/attachment.bin 


More information about the Qt-interest-old mailing list