[Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings

Thu Oct 15 16:52:43 CEST 2015

Op 15-10-2015 om 14:52 schreef Konstantin Ritt:
>
>
> For everything but US-ASCII / Latin-1, UTF-8 isn't faster than UTF-16 
> (feel free to compare their complexity against UTF-32).
> And why "pure Chinese signs" again? Did you ever look into the 
> Unicode's Scripts.txt [1], for example? It clearly shows UTF-16 covers 
> [almost] all spoken languages, without any performance hits (in 
> compare to UTF-8), and all we have to pay is an extra byte per every 
> Base Latin character (in compare to UTF-8, again).
>
> [1] http://www.unicode.org/Public/8.0.0/ucd/Scripts.txt
>
"All we have to pay"? Isn't that quite a significant cost, if your every 
other byte in your data is going to be null? Doesn't that impact cache 
lines? Doesn't that impact how many characters you can stuff into a 
string with SSO (as planned for Qt6?) and thus when you need to start 
allocating? I would certainly think that there is an impact on things 
like XML parsing speed. But true enough, that is without measuring that.

I also think it should be the default for saving QStrings to files. Many 
years ago, I was writing with an application on a very constrained 
mobile device dealing with maps. Converting the map format to use UTF8 
instead of UTF16 for its contained strings (which were many) caused a 
very significant reduction of file size (impacting the map area the 
users could take with them on the road) and loading speed (impacting UX 
very positively).

André