[Development] Qt6: Adding UTF-8 storage support to QString
Thiago Macieira
thiago.macieira at intel.com
Thu Jan 24 22:57:18 CET 2019
On Wednesday, 23 January 2019 23:32:28 PST Olivier Goffart wrote:
> - Introduce some iterator that iterates over unicode code points.
I wrote that about a decade ago. It's called QStringIterator and it's inside
our sources, but in a private header.
But we may want to make it iterate over grapheme clusters instead of Unicode
codepoints. That is, make it use QTextBoundaryFinder to iterate, instead of
decode the storage to UTF-32.
> - Deprecate utf16() and other API that assume that QString is UTF-16
> - Replace them by a toUtf16 which returns a QVector<ushort>. I believe
> that it is possible to make the cotent implicitly shared with the QString,
> avoiding copies. (since it is just a QTypedArrayData internally)
QVector<char16_t>.
Sharing QVector and QString is possible, but we need to fix a few
discrepancies, especially that of QVector not being allowed to be raw data,
while QString can be (QVector::fromRawData was proposed for Qt 5.0 [Andreas
Hartmetz, if I'm not mistaken] but we never added it). So this is fixable for
Qt 6, but not before Qt 6.
I think I tried even in my branch and ran into a lot of trouble. It was a non-
obvious change. So I abandoned it.
Still, we're not going to switch away from UTF-16 in Qt 6. The best we can do
is pave the way for switching in Qt 7, if we add the methods you're talking
about, change ALL the Windows, Cocoa and Android code that calls .data() and
assumes it to be UTF-16 to toUtf16(). We may want to have some #defines like
the QStringView stirng level or the ASCII-cast ones, so we catch those.
But we WILL NOT change from UTF-16 in the next 2 years.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list