[Development] Why can't QString use UTF-8 internally?

Thu Feb 12 08:53:38 CET 2015

2015-02-12 11:39 GMT+04:00 Rutledge Shawn <Shawn.Rutledge at theqtcompany.com>:

>
> On 11 Feb 2015, at 18:15, Konstantin Ritt <ritt.ks at gmail.com> wrote:
>
> > FYI: Unicode codepoint != character visual representation. Moreover, a
> single character could be represented with  a sequence of glyps or vice
> versa - a sequence of characters could be represented with a single glyph.
> > QString (and every other Unicode string class in the world) represents a
> sequence of Unicode codepoints (in this or that UTF), not characters or
> glyphs - always remember that!
>
> Is it impossible to convert some of the possible multi-codepoint sequences
> into single ones, or is it just that we prefer to preserve them so that
> when you convert back to UTF you get the same bytes with which you created
> the QString?
>

Not sure I understand your question in context of visual representation.
Assume you're talking about composing the input string (though the same
string, composed and decomposed, would be shaped into the same sequence of
glyphs).
A while ago we decided to not change the composition form of the input text
and let the user to (de)compose where he needs a fixed composition form, so
that QString(wellformed_unicode_text).toUnicode() ==
wellformed_unicode_text.

Regards,
Konstantin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/development/attachments/20150212/11857384/attachment.html>