[Development] Why can't QString use UTF-8 internally?
Thiago Macieira
thiago.macieira at intel.com
Tue Feb 10 21:03:01 CET 2015
On Tuesday 10 February 2015 22:58:58 Konstantin Ritt wrote:
> 16 bits is completely enough for most spoken languages (see the
s/most/all/
All *living* languages are encoded in the BMP. The SMP and other planes
contain only dead languages (Egyptian hieroglyphs, Linear A, Linear B, etc.),
plus some extended math symbols, emoticons, and other similar stuff.
> Unicode's Blocks.txt and/or Scripts.txt for an approximated list), whereas
> 8 bits encoding only covers ASCII.
> Despite of what http://utf8everywhere.org/#conclusions says, UTF-16 is not
> the worst choice; it is a trade-off between the performance and the memory
> consumption in the most-common use case (spoken languages and mixed
> scripts).
Blocks file: ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list