[Development] Why can't QString use UTF-8 internally?
Thiago Macieira
thiago.macieira at intel.com
Tue Feb 10 22:26:50 CET 2015
On Wednesday 11 February 2015 00:37:41 Konstantin Ritt wrote:
> Yes, that would be an ideal solution. Unfortunately, that would also break
> a LOT of existing code.
> In Qt4 times, I was doing some experiments with the QString adaptive
> storage (similar to what NSString does behind the scenes).
I've thought of this too.
This stumbles on QString's implicit sharing. If you do this:
QString foo = "some UTF-8 text";
QString copy = foo;
qDebug() << foo.constData()[0];
Then the last line is invoking the const function constData(), which needs to
return UTF-16 data. If the original QString had only UTF-8 internally, it
wouldn't be able to since it would have to write to shared memory.
It's not insurmountable. I can think of two solutions:
1) pre-allocate enough space for the UTF-16 data (strlen(utf8) * 2), so that
the const functions can implicitly write to the UTF-16 block when needed.
Since the original UTF-8 data is constant and if there are no out-of-thin-air
values, multiple threads could do this operation simultaneously safely.
2) indirect the UTF-16 data via an extra, atomic pointer. When a thread finds
it needs the UTF-16 data and doesn't have it, it allocates memory, does the
conversion, then testAndSetRelease the pointer (similar to Qt 4
Q_GLOBAL_STATIC).
I'd choose #2 for two reasons:
a) closer to the QString I already have for Qt 6, which inlines the "begin"
pointer in the QString object itself.
b) #1 has actually a bigger memory overhead than current solutions
But given the choice, I would choose to do nothing. Instead, I have a patch
pending for Qt 6 that caches the Latin1 version of the QString in an extra
block past the UTF-16 data.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list