[Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings

André Somers andre at familiesomers.nl
Thu Oct 15 16:52:43 CEST 2015


Op 15-10-2015 om 14:52 schreef Konstantin Ritt:
>
>
> For everything but US-ASCII / Latin-1, UTF-8 isn't faster than UTF-16 
> (feel free to compare their complexity against UTF-32).
> And why "pure Chinese signs" again? Did you ever look into the 
> Unicode's Scripts.txt [1], for example? It clearly shows UTF-16 covers 
> [almost] all spoken languages, without any performance hits (in 
> compare to UTF-8), and all we have to pay is an extra byte per every 
> Base Latin character (in compare to UTF-8, again).
>
> [1] http://www.unicode.org/Public/8.0.0/ucd/Scripts.txt
>
"All we have to pay"? Isn't that quite a significant cost, if your every 
other byte in your data is going to be null? Doesn't that impact cache 
lines? Doesn't that impact how many characters you can stuff into a 
string with SSO (as planned for Qt6?) and thus when you need to start 
allocating? I would certainly think that there is an impact on things 
like XML parsing speed. But true enough, that is without measuring that.

I also think it should be the default for saving QStrings to files. Many 
years ago, I was writing with an application on a very constrained 
mobile device dealing with maps. Converting the map format to use UTF8 
instead of UTF16 for its contained strings (which were many) caused a 
very significant reduction of file size (impacting the map area the 
users could take with them on the road) and loading speed (impacting UX 
very positively).

André





More information about the Development mailing list