[Development] Qt6: Adding UTF-8 storage support to QString

Sat Jan 26 00:16:25 CET 2019

On Friday, 25 January 2019 13:39:49 PST Konstantin Tokarev wrote:
> > All living languages are supposed to be stored in the BMP, which means no
> > UTF-16 surrogate pairs to encode them.
> 
> AFAIK all emojis are encoded with surrogate pairs

Emojis are not part of a living language. They're drawings. But yes, they're 
outside the BMP.

In any case, they're often represented by more than one codepoint anyway, so 
whether we used N*2 UTF-16 code units to represent them or N UTF-32 code 
units, it makes no difference. Your code needs to know how to deal with them, 
where to properly break, how to combine them, how to calculate the width, etc.

Also note how they'd be represented by N*4 bytes in UTF-8, which means all 
three representations take exactly the same amount of memory.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center