[Development] Qt6: Adding UTF-8 storage support to QString
Konstantin Tokarev
annulen at yandex.ru
Wed Jan 16 22:16:39 CET 2019
15.01.2019, 23:13, "Alexander Akulich" <akulichalexander at gmail.com>:
> Cristian,
>
> the previous discussion is "Why can't QString use UTF-8 internally?"
> There is something wrong with our maillist, the best link I found is
> [1]. For some reason link to the thread head [2] is broken.
>
> [1] https://lists.qt-project.org/pipermail/development/2015-February/040199.html
Note that if anyone wants to use easier character indexing as an argument for using UTF-16 instead of UTF-8,
that's not the case.
1. Code points may be encoded as surrogate pairs in UTF-16, e.g. this is the case for Emoji characters. QString
ignores this fact, indexing 16-bit QChars. To make things worse, several QString methods like left(), right(), and mid()
will happily cut surrogate pair in a half.
2. When people are talking about character indexing they often imply indexing of grapheme clusters. In Unicode world
grapheme cluster may be represented as a several code points depending on normalization form of the source.
To make things worse, even in NFC form not every grapheme cluster that is possible in Unicode is representable as a
single code point.
> [2] https://lists.qt-project.org/pipermail/development/2015-February/020155.html
>
> On Tue, Jan 15, 2019 at 9:48 PM Cristian Adam <cristian.adam at gmail.com> wrote:
>> Hi,
>>
>> With every Qt release we see how the new release improved over previous releases in terms of speed, memory consumption, etc.
>>
>> Any chance of having UTF-8 storage support for QString?
>>
>> UTF-8 is native on Linux and other *NIX platforms, Qt programs should use less memory, and perform better by reading less bytes from memory.
>>
>> Did anybody try this?
>>
>> I've heard that Qt Creator is storing sources files both in UTF-8 format for libclang, and UTF16 for its internal usage. That sounds like a bit wasteful.
>>
>> KDE Plasma could then better compare / compete with the other Linux desktop environments which use UTF-8 for strings.
>>
>> I guess I could use CopperSpice to test this, since they added CsString with both QString8 (UTF-8) and QString16 (UTF-16) supported.
>>
>> https://utf8everywhere.org/ states "UTF-16 is the worst of both worlds, being both variable length and too wide"
>>
>> Cheers,
>> Cristian.
>> _______________________________________________
>> Development mailing list
>> Development at qt-project.org
>> https://lists.qt-project.org/listinfo/development
>
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> https://lists.qt-project.org/listinfo/development
--
Regards,
Konstantin
More information about the Development
mailing list