[Development] Qt6: Adding UTF-8 storage support to QString

Thiago Macieira thiago.macieira at intel.com
Thu Jan 24 22:57:18 CET 2019


On Wednesday, 23 January 2019 23:32:28 PST Olivier Goffart wrote:
>   - Introduce some iterator that iterates over unicode code points.

I wrote that about a decade ago. It's called QStringIterator and it's inside 
our sources, but in a private header.

But we may want to make it iterate over grapheme clusters instead of Unicode 
codepoints. That is, make it use QTextBoundaryFinder to iterate, instead of 
decode the storage to UTF-32.

>   - Deprecate utf16()  and other API that assume that QString is UTF-16
>   - Replace them by a toUtf16 which returns a QVector<ushort>.  I believe
> that it is possible to make the cotent implicitly shared with the QString,
> avoiding copies. (since it is just a QTypedArrayData internally)

QVector<char16_t>.

Sharing QVector and QString is possible, but we need to fix a few 
discrepancies, especially that of QVector not being allowed to be raw data, 
while QString can be (QVector::fromRawData was proposed for Qt 5.0 [Andreas 
Hartmetz, if I'm not mistaken] but we never added it). So this is fixable for 
Qt 6, but not before Qt 6.

I think I tried even in my branch and ran into a lot of trouble. It was a non-
obvious change. So I abandoned it.

Still, we're not going to switch away from UTF-16 in Qt 6. The best we can do 
is pave the way for switching in Qt 7, if we add the methods you're talking 
about, change ALL the Windows, Cocoa and Android code that calls .data() and 
assumes it to be UTF-16 to toUtf16(). We may want to have some #defines like 
the QStringView stirng level or the ASCII-cast ones, so we catch those.

But we WILL NOT change from UTF-16 in the next 2 years. 

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center






More information about the Development mailing list