[Qt-interest] How to compress QString

Thiago Macieira thiago.macieira at trolltech.com
Thu Apr 16 17:26:43 CEST 2009


Em Quinta-feira 16 Abril 2009, às 16:58:51, Jan Kundrát escreveu:
> Oliver.Knoll at comit.ch wrote:
> > How is UTF-8 more stable and better documented than UTF-16? I thought
> > the range of Unicode characters is pretty well-defined? Or am I
> > missing something here?
>
> By the time I wrote that mail, I wasn't aware of the fact that Qt
> declares that it uses UTF-16 for QStrings internally. If there was no
> such guarantee, you'd have problems when touching QString's char*
> returned by data().

QString::data() doesn't return char*. It returns QChar*.

Remember: QString is an array of QChar, each of which is an UTF-16 entry. If 
you treated QString::data() as char*, you'd probably run into problems due to 
there being a 0 byte every other byte (ASCII and Latin 1 are less than 0x100, 
so the high byte is 0).

> However, I'm not sure if the size of the str.utf16() is really
> str.size() * sizeof(ushort) -- what happens if there are surrogate pairs
> in the string? Will they be reflected in QString's size()?

It's UTF-16, not UCS-4 or some other weird encoding that treats codepoints 
above U+FFFF differently.

That means surrogate pairs may appear in the string, in their correct order.

That means QString::length() (and size() and count()) return the number of 
UTF-16 characters/words, not the number of Unicode codepoints. Also note that 
the number of codepoints is also different from the string's width, even in a 
monospace fonts (there are codepoints with zero width, normal width or double 
width).

-- 
Thiago Macieira - thiago.macieira (AT) nokia.com
  Senior Product Manager - Nokia, Qt Software
     Sandakerveien 116, NO-0402 Oslo, Norway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20090416/e2044ab7/attachment.bin 


More information about the Qt-interest-old mailing list