[Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
Thiago Macieira
thiago.macieira at intel.com
Wed Oct 14 17:51:11 CEST 2015
On Wednesday 14 October 2015 12:16:47 Marc Mutz wrote:
> > >But as a condition to be even considered, it needs to be only for the
> > >methods
> > >that do not hold a copy of the string. That is, methods that immediately
> > >consume the string and no longer need to reference its contents.
>
> Thiago, I think it would help the discussion if you quickly summarised your
> planned changes to QString in Qt 6.
>
> AFAIK, the size and offset will move into the object, so I expected that
> Q6String would subsume QStringRef, because each QString could provide a
> separate view on the shared underlying data. I also was led to believe that
> Q6String would use SSO, which, given its inceased sizeof(), would make a
> lot of sense, imo.
Indeed, that's the biggest gain. QString will contain a QStringPrivate, which
is
struct QStringPrivate
{
QArrayData *d;
ushort *b;
qsize size; // let's bikeshed what qsize is later
};
My current code initialises a QStringLiteral like so:
# define QStringLiteral(str) \
([]() -> QString { \
QStringPrivate holder = { \
QArrayData::sharedStatic(), \
reinterpret_cast<ushort *>(const_cast<qunicodechar
*>(QT_UNICODE_LITERAL(str))), \
sizeof(QT_UNICODE_LITERAL(str))/2 - 1 }; \
return QString(holder); \
}()) \
The separation of the string itself from the size and the d pointer allows the
compiler, if it wants to, to share strings. In fact, disassembly of
f(QStringLiteral("foo"), QStringLiteral("foo"))
produces one copy of u"foo" only.
Like you said, QString can become its own QStringView/QStringRef/QSubString.
QString::left/mid/right can simply copy the d pointer, increment the refcount,
then adjust b and size. This solves the issue I had with your proposal:
passing a QStringView to a method that decides to copy it, so it wouldn't
participate in reference counting. The drawback with this is the pathological
case where a short substring is holding a large data block hostage.
My next objective, not yet achieved due to lack of time, is to make that
QArrayData::sharedStatic() actually be a null pointer. That is, for anything
that we didn't allocate memory for, the d pointer should be null. That implies
a much faster loading of constant QStringLiterals and much faster handling of
the decrement case. The biggest pain point in the code above in my current
version is what happens after the call to f(): the compiler generates 2x bit
testing of d->flags and calls to QArrayData::deallocate(), which are dead code
and will never be run.
After that, implement SSO, which should hold 11 UTF-16 characters, including
the null terminator. If we benchmark and find that we could use more, we can
simply artificially increase sizeof(QString) to 32, which may have some extra
benefits of its own, including the fact that the 24-byte short QString will be
at odds with the null d pointer -- the if (d) check instead becomes
if (quintptr(d) & ~quintptr(1))
[also note how the order of the members in QStringPrivate needs to change for
big-endian architectures]
[and note everything I say about QString also applies to QByteArray and
QVector]
> And then I thought, QString would be converted to hold UTF-8. I saw
> wip/qstring-utf8 fly by on gerrit, but ok, that hasn't received any updates
> since 2012.
That was when we converted the QString methods taking const char* from Latin1
to UTF-8. The backing store has never changed.
My version of QString stores an extra flag that indicates whether the string is
US-ASCII, in which case we can run the unchecked to-Latin1 algorithm in both
toLatin1 and toUtf8. Another idea I had but haven't investigated is to cache
that result, which requires the returned QByteArray to share the d pointer
with the QString.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
More information about the Development
mailing list