[Development] Qt6: Adding UTF-8 storage support to QString
Thiago Macieira
thiago.macieira at intel.com
Sat Jan 26 00:16:25 CET 2019
On Friday, 25 January 2019 13:39:49 PST Konstantin Tokarev wrote:
> > All living languages are supposed to be stored in the BMP, which means no
> > UTF-16 surrogate pairs to encode them.
>
> AFAIK all emojis are encoded with surrogate pairs
Emojis are not part of a living language. They're drawings. But yes, they're
outside the BMP.
In any case, they're often represented by more than one codepoint anyway, so
whether we used N*2 UTF-16 code units to represent them or N UTF-32 code
units, it makes no difference. Your code needs to know how to deal with them,
where to properly break, how to combine them, how to calculate the width, etc.
Also note how they'd be represented by N*4 bytes in UTF-8, which means all
three representations take exactly the same amount of memory.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list