[Development] Qt6: Adding UTF-8 storage support to QString

Fri Jan 25 17:39:45 CET 2019

> By all means, let's make sure the internals are efficient for the more
> common languages and scripts; but it's way past time to start doing
> Unicode properly, so that all cultures are well-served by default, when
> the software folk are using is built on Qt,

I don't think anyone knows what "properly" is. But the more I think about it, the more I like the idea I expressed as a list of sequences of various character sizes. I think it is a good balance between space and efficiency. To recap that:
A class that stores a list of list of same-width characters. For the most naive case the list is 1 list long and contains only 8bit characters. This performs identically to QByteArray. Non-ASCII languages requiring 16-bit storage are as QStrings are now. Then, in the more complicated scenarios, it breaks out 8-bit segments and 16-bit segments and makes them appear contiguous. (Emoji in ASCII text). Of course there could be functions to collapse it all to the uniform largest used width (maximize()) or break it apart to minimize() space (for very long 8-bit strings with occasional characters), and there can even be a bestFit() heuristic. And as always you can get it serialized as UTF-8 or 16... All the above also extends to 32-bit as well. I think this blends handles the average case very well (all characters of same width) and has reasonable cost for occasional exotic characters.