[Development] Why can't QString use UTF-8 internally?

Thiago Macieira thiago.macieira at intel.com
Tue Feb 10 21:03:01 CET 2015


On Tuesday 10 February 2015 22:58:58 Konstantin Ritt wrote:
> 16 bits is completely enough for most spoken languages (see the

s/most/all/

All *living* languages are encoded in the BMP. The SMP and other planes 
contain only dead languages (Egyptian hieroglyphs, Linear A, Linear B, etc.), 
plus some extended math symbols, emoticons, and other similar stuff.

> Unicode's Blocks.txt and/or Scripts.txt for an approximated list), whereas
> 8 bits encoding only covers ASCII.
> Despite of what http://utf8everywhere.org/#conclusions says, UTF-16 is not
> the worst choice; it is a trade-off between the performance and the memory
> consumption in the most-common use case (spoken languages and mixed
> scripts).

Blocks file: ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center




More information about the Development mailing list