[Qt-interest] Accented characters and QDom in Windows

Thiago Macieira thiago.macieira at trolltech.com
Fri Aug 21 11:14:17 CEST 2009


Em Sexta-feira 21 Agosto 2009, às 10:02:51, Ellen Kestrel escreveu:
> Just to be clear, here, I am not passing QStrings to the fromXXX functions.
> I don't think I've ever used those functions for anything other than string
> literals.  The characters that cannot be input into the line edits are
>  (that I've noticed) slightly less common IPA characters (ɾ ʃ ʧ ʤ, for
>  example).

Hi Ellen

There should be no problem with those characters.

I'm seeing them right now and it's a Qt4 application using QTextEdit (KMail, 
actually KTextEdit). I can copy and paste them without a problem: ɾ ʃ ʧ ʤ

Can you post a small code sample showing how the problem happens?

> Another strange thing I have noticed now, which might be relevant here, is
> that the files containing accented and special characters that were written
> to disc under linux (in Utf-8) display the special characters as "á" and
> the like in Windows, which looks like some kind of encoding mismatch.

That's correct: it's an encoding mismatch. Please make sure your Windows 
program is reading those files in UTF-8.

Most Windows programs will try to read 8-bit files under the "ANSI" encoding, 
which is the name of the legacy codec on Windows. Unless you're using 
Vietnamese Windows, that's not UTF-8. It's going to be CP 1251, 1252, etc.

Qt programs on Windows are not an exception: the "System" codec 
(fromLocal8Bit/toLocal8Bit) on Windows does not default to UTF-8. On modern 
Linux systems, the "System" codec most certainly is UTF-8. It is certainly the 
case for MacOS X.

> I
> vaguely remember seeing this happen a long time ago when I tried to get a
> QLabel to display special characters that were entered into the source code
> without doing toUtf8 () on them.  The Windows app is able to parse those
> files just fine, even when reading them in via QTextStream.  I'm viewing
> these files with gvim for Windows.

It probably applies the same heuristics as Qt. Ensure that you reload the file 
with UTF-8 (to open a file with Emacs: C-x RET c utf-8 C-x C-f filename; or if 
you already have it open: C-x RET f utf-8, then M-x revert-buffer).

Emacs also reads the following anywhere in the first few lines of the file and 
applies the codec successfully:

	-*- encoding: utf-8 -*-

Python uses the same process. C++0x should do the same.

> I did compile with the no-casting-to/from-ascii defines, and it only causes
> a few errors from setting the text of QPushButtons on initialization with a
> hardcoded string that contains only standard, unaccented characters.

-- 
Thiago Macieira - thiago.macieira (AT) nokia.com
  Senior Product Manager - Nokia, Qt Development Frameworks
     Sandakerveien 116, NO-0402 Oslo, Norway

Qt Developer Days 2009 | Registration Now Open!
Munich, Germany: Oct 12 - 14     San Francisco, California: Nov 2 - 4
      http://qt.nokia.com/qtdevdays2009
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20090821/730b55b0/attachment.bin 


More information about the Qt-interest-old mailing list