[Development] Why can't QString use UTF-8 internally?

Konstantin Ritt ritt.ks at gmail.com
Thu Feb 12 08:55:56 CET 2015


2015-02-12 11:53 GMT+04:00 Konstantin Ritt <ritt.ks at gmail.com>:

> 2015-02-12 11:39 GMT+04:00 Rutledge Shawn <Shawn.Rutledge at theqtcompany.com
> >:
>
>>
>> On 11 Feb 2015, at 18:15, Konstantin Ritt <ritt.ks at gmail.com> wrote:
>>
>> > FYI: Unicode codepoint != character visual representation. Moreover, a
>> single character could be represented with  a sequence of glyps or vice
>> versa - a sequence of characters could be represented with a single glyph.
>> > QString (and every other Unicode string class in the world) represents
>> a sequence of Unicode codepoints (in this or that UTF), not characters or
>> glyphs - always remember that!
>>
>> Is it impossible to convert some of the possible multi-codepoint
>> sequences into single ones, or is it just that we prefer to preserve them
>> so that when you convert back to UTF you get the same bytes with which you
>> created the QString?
>>
>
> Not sure I understand your question in context of visual representation.
> Assume you're talking about composing the input string (though the same
> string, composed and decomposed, would be shaped into the same sequence of
> glyphs).
> A while ago we decided to not change the composition form of the input
> text and let the user to (de)compose where he needs a fixed composition
> form, so that QString(wellformed_unicode_text).toUnicode() ==
> wellformed_unicode_text.
>

P.S. We could re-consider this or could introduce a macro that would change
the composition form of a QString input but...why?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/development/attachments/20150212/79219acb/attachment.html>


More information about the Development mailing list