[Development] Qt6: Adding UTF-8 storage support to QString

Konstantin Tokarev annulen at yandex.ru
Wed Jan 16 22:16:39 CET 2019



15.01.2019, 23:13, "Alexander Akulich" <akulichalexander at gmail.com>:
> Cristian,
>
> the previous discussion is "Why can't QString use UTF-8 internally?"
> There is something wrong with our maillist, the best link I found is
> [1]. For some reason link to the thread head [2] is broken.
>
> [1] https://lists.qt-project.org/pipermail/development/2015-February/040199.html

Note that if anyone wants to use easier character indexing as an argument for using UTF-16 instead of UTF-8,
that's not the case. 

1. Code points may be encoded as surrogate pairs in UTF-16, e.g. this is the case for Emoji characters. QString
ignores this fact, indexing 16-bit QChars. To make things worse, several QString methods like left(), right(), and mid()
will happily cut surrogate pair in a half. 

2. When people are talking about character indexing they often imply indexing of grapheme clusters. In Unicode world
grapheme cluster may be represented as a several code points depending on normalization form of the source.
To make things worse, even in NFC form not every grapheme cluster that is possible in Unicode is representable as a
single code point.

> [2] https://lists.qt-project.org/pipermail/development/2015-February/020155.html


>
> On Tue, Jan 15, 2019 at 9:48 PM Cristian Adam <cristian.adam at gmail.com> wrote:
>>  Hi,
>>
>>  With every Qt release we see how the new release improved over previous releases in terms of speed, memory consumption, etc.
>>
>>  Any chance of having UTF-8 storage support for QString?
>>
>>  UTF-8 is native on Linux and other *NIX platforms, Qt programs should use less memory, and perform better by reading less bytes from memory.
>>
>>  Did anybody try this?
>>
>>  I've heard that Qt Creator is storing sources files both in UTF-8 format for libclang, and UTF16 for its internal usage. That sounds like a bit wasteful.
>>
>>  KDE Plasma could then better compare / compete with the other Linux desktop environments which use UTF-8 for strings.
>>
>>  I guess I could use CopperSpice to test this, since they added CsString with both QString8 (UTF-8) and QString16 (UTF-16) supported.
>>
>>  https://utf8everywhere.org/ states "UTF-16 is the worst of both worlds, being both variable length and too wide"
>>
>>  Cheers,
>>  Cristian.
>>  _______________________________________________
>>  Development mailing list
>>  Development at qt-project.org
>>  https://lists.qt-project.org/listinfo/development
>
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> https://lists.qt-project.org/listinfo/development

-- 
Regards,
Konstantin




More information about the Development mailing list