[Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
André Somers
andre at familiesomers.nl
Thu Oct 15 16:52:43 CEST 2015
Op 15-10-2015 om 14:52 schreef Konstantin Ritt:
>
>
> For everything but US-ASCII / Latin-1, UTF-8 isn't faster than UTF-16
> (feel free to compare their complexity against UTF-32).
> And why "pure Chinese signs" again? Did you ever look into the
> Unicode's Scripts.txt [1], for example? It clearly shows UTF-16 covers
> [almost] all spoken languages, without any performance hits (in
> compare to UTF-8), and all we have to pay is an extra byte per every
> Base Latin character (in compare to UTF-8, again).
>
> [1] http://www.unicode.org/Public/8.0.0/ucd/Scripts.txt
>
"All we have to pay"? Isn't that quite a significant cost, if your every
other byte in your data is going to be null? Doesn't that impact cache
lines? Doesn't that impact how many characters you can stuff into a
string with SSO (as planned for Qt6?) and thus when you need to start
allocating? I would certainly think that there is an impact on things
like XML parsing speed. But true enough, that is without measuring that.
I also think it should be the default for saving QStrings to files. Many
years ago, I was writing with an application on a very constrained
mobile device dealing with maps. Converting the map format to use UTF8
instead of UTF16 for its contained strings (which were many) caused a
very significant reduction of file size (impacting the map area the
users could take with them on the road) and loading speed (impacting UX
very positively).
André
More information about the Development
mailing list