[Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings

Wed Oct 14 17:51:11 CEST 2015

On Wednesday 14 October 2015 12:16:47 Marc Mutz wrote:
> > >But as a condition to be even considered, it needs to be only for the
> > >methods 
> > >that do not hold a copy of the string. That is, methods that immediately
> > >consume the string and no longer need to reference its contents.
> 
> Thiago, I think it would help the discussion if you quickly summarised your 
> planned changes to QString in Qt 6.
> 
> AFAIK, the size and offset will move into the object, so I expected that 
> Q6String would subsume QStringRef, because each QString could provide a 
> separate view on the shared underlying data. I also was led to believe that 
> Q6String would use SSO, which, given its inceased sizeof(), would make a
> lot of sense, imo.

Indeed, that's the biggest gain. QString will contain a QStringPrivate, which 
is

struct QStringPrivate
{
    QArrayData *d;
    ushort *b;
    qsize size;		// let's bikeshed what qsize is later
};

My current code initialises a QStringLiteral like so:

#  define QStringLiteral(str) \
    ([]() -> QString { \
        QStringPrivate holder = {  \
            QArrayData::sharedStatic(), \
            reinterpret_cast<ushort *>(const_cast<qunicodechar 
*>(QT_UNICODE_LITERAL(str))), \
            sizeof(QT_UNICODE_LITERAL(str))/2 - 1 }; \
        return QString(holder); \
    }()) \

The separation of the string itself from the size and the d pointer allows the 
compiler, if it wants to, to share strings. In fact, disassembly of 

	f(QStringLiteral("foo"), QStringLiteral("foo")) 

produces one copy of u"foo" only.

Like you said, QString can become its own QStringView/QStringRef/QSubString. 
QString::left/mid/right can simply copy the d pointer, increment the refcount, 
then adjust b and size. This solves the issue I had with your proposal: 
passing a QStringView to a method that decides to copy it, so it wouldn't 
participate in reference counting. The drawback with this is the pathological 
case where a short substring is holding a large data block hostage.

My next objective, not yet achieved due to lack of time, is to make that 
QArrayData::sharedStatic() actually be a null pointer. That is, for anything 
that we didn't allocate memory for, the d pointer should be null. That implies 
a much faster loading of constant QStringLiterals and much faster handling of 
the decrement case. The biggest pain point in the code above in my current 
version is what happens after the call to f(): the compiler generates 2x bit 
testing of d->flags and calls to QArrayData::deallocate(), which are dead code 
and will never be run.

After that, implement SSO, which should hold 11 UTF-16 characters, including 
the null terminator. If we benchmark and find that we could use more, we can 
simply artificially increase sizeof(QString) to 32, which may have some extra 
benefits of its own, including the fact that the 24-byte short QString will be 
at odds with the null d pointer -- the if (d) check instead becomes
	if (quintptr(d) & ~quintptr(1))

[also note how the order of the members in QStringPrivate needs to change for 
big-endian architectures]

[and note everything I say about QString also applies to QByteArray and 
QVector]

> And then I thought, QString would be converted to hold UTF-8. I saw 
> wip/qstring-utf8 fly by on gerrit, but ok, that hasn't received any updates 
> since 2012.

That was when we converted the QString methods taking const char* from Latin1 
to UTF-8. The backing store has never changed.

My version of QString stores an extra flag that indicates whether the string is 
US-ASCII, in which case we can run the unchecked to-Latin1 algorithm in both 
toLatin1 and toUtf8. Another idea I had but haven't investigated is to cache 
that result, which requires the returned QByteArray to share the d pointer 
with the QString.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center