[Qt-interest] Does Qt support Unicode 5.1?

Shaun Cummins cumminss at gmail.com
Fri Mar 6 07:16:55 CET 2009


>static QChar HighSurrogate (unsigned c)
>{
>     return QChar(((c - 0x10000) >> 10 & 0x3ff) + 0xd800);
>}
>
>static QChar LowSurrogate (unsigned c)
>{
>     return QChar((c & 0x3ff) + 0xdc00);
>}
>
>To Qt developers: Your surrogate characters handling code seems to be
>wrong

Would you care to elaborate? What's wrong with QChar::lowSurrogate and
QChar::highSurrogate?

    static inline ushort highSurrogate(uint ucs4) {
        return (ucs4>>10) + 0xd7c0;
    }
    static inline ushort lowSurrogate(uint ucs4) {
        return ucs4%0x400 + 0xdc00;
    }

In addition, from the Unicode FAQ
http://www.unicode.org/faq/utf_bom.html#utf16-3:

// constants
const UTF32 LEAD_OFFSET = 0xD800 - (0x10000 >> 10);

// computations
UTF16 lead = LEAD_OFFSET + (codepoint >> 10);
UTF16 trail = 0xDC00 + (codepoint & 0x3FF);

(LEAD_OFFSET is equal to 0xd7c0)

I agree with Thiago. The only difference I can see is if you gave junk
values greater than the range of the Unicode standard to the
highSurrogate() function you would get different output than you would
with the HighSurrogate() function and I usually go by the axiom of
junk in junk out and don't try to condition bad data to fit within the
valid range. The Unicode standard is future-proof in this regard in
that they have promised to never add more characters than will fit
within the current number of available planes so any character outside
of this is guaranteed to be invalid now and in the future.




More information about the Qt-interest-old mailing list