[Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings

Wed Oct 14 12:16:47 CEST 2015

On Wednesday 14 October 2015 08:37:19 Knoll Lars wrote:
> I’m not a huge fan of having different overloads with QString, QStringRef
> and QLatin1String and in some cases (QChar *, int) for many methods
> neither. But while your proposal solves some problems it introduces others.
> 
> A QStringView class would only work for methods that read the data
> contained in it, but don’t try to modify it or take a copy (as Thiago
> pointed out).

I do not agree with that statement.

First, afaiu from what Thiago mentions in reviews, Q6String will have SSO 
(small-string-optimisation) which makes many short strings expensive to copy 
(if you think that copying 24 bytes is slower than upping an atomic int 
through an indirection) or cheap to copy (if you think the opposite). In any 
case, small strings will be very cheap to create (no allocation), so for many 
strings there will be not much difference between passing a QStringView or 
passing a QString.

Second, upon modification, the QString will detach (make a copy), and _then_ 
perform the operation. With a QStringView and an efficient base of operations, 
those two operations can be folded into one (basically as the const versions 
of QString methods, where they exist, do (or should do)). Unless the operation 
can and will actually be done in the original allocation (ie. incl. no 
detach), the const methods should be faster. That will never be the case when 
you pass QString by const-&, because there will always be the lvalue parameter 
attached to the QString instance. For modification in-place to work, you need 
to pass by rvalue ref. So for typical functions modifying the string, there's 
also no difference between QString and QStringView.

That leaves classes which simply store the string. You cited QUrl. I don't see 
a problem providing QString overloads for these, esp. considering that we're 
starting out with an all-QString API here. Then again, once we have 
QStringView overloads, we can simply disable the QString overloads and see the 
effect.

BTW: functions storing a passed QString as-is should provide a QString&& 
overload, and that might be a good idea even when otherwise using QStringView 
only.

> And you certainly can’t keep the pointer to the data around
> for longer than the lifetime of the QStringView, so it’s to some extent an
> advanced class you have to be careful when using in your own APIs.

It's like the distinction between QModelIndex and QPersistentModelIndex. The 
first is an interface type, the latter the storage type. Neither is more 
"advanced" than the other. They are complements.

> So it can work nicely for methods such as QString::indexOf and similar,
> but will never be good for methods that need to copy the string (e.g.
> QUrl::setHostName).
> 
> 
> Another thing I wonder about is whether we shouldn’t deprecate
> QLatin1String moving forward. We have QStringLiteral, and even though it’s
> implementation is not ideal, we should be able to get it working
> everywhere now with Qt 5.7. Let’s think about how and whether we can
> improve it’s implementation to fix the remaining issues. Then we could
> remove/deprecate QLatin1String.

There are problems in QStringLiteral that cannot be solved. Common data 
sharing will never happen with the current syntax. I'd suggest a 
QStaticString, a fully constexpr wrapper around QStaticStringData<N>, 
basically to determine the N transparently, which can be used as a variable at 
namespace scope in lieu of the current need to pack all QStringLiterals into 
static inline functions. But that's outside the scope of this thread, so let's 
not go there.

> On 13/10/15 23:01, "Thiago Macieira" <thiago.macieira at intel.com> wrote:
> >On Tuesday 13 October 2015 22:46:36 Marc Mutz wrote:
> >> Q: What mistakes do you refer to?
> >>
> >> 
> >>
> >> A: The fact that it has copy ctor and assignment operator, so it's not a
> >> trivally-copyable type and thus cannot efficiently passed by-value. It
> >>
> >>may
> >>
> >> also be too large for pass-by-value due to the rather useless QString
> >> pointer (should have been QStringData*, if any). Neither can be fixed
> >> before Qt 6.
> >
> >Not even in Qt 6. The reason why it uses a QString pointer is that it
> >follows 
> >the QString through reallocations. If the QString is mutated, the
> >QStringRef 
> >will still be valid (provided it isn't shortened beyond the substring the
> >QStringRef points to). There's a lot of code that depends on this, so we
> >can't 
> >change it.

QString foo = "foo";
QStringRef ref = foo.midRef(1); // ref == "oo";
foo = "bar"; // oops, ref == "ar";

We could change it to hold QString::Data* instead, though, right? And make it 
share ownership of the QString::Data, in which case we have a QString that has 
position and size inline. Or, if it doesn't participate in the ownership, we 
can start returning QStringRef from QStringLiteral(Ref?), killing one major 
QSL problem (out-of-line QString dtor litter).

> Only by deprecating QStringRef and not using it ourselves anymore. But
> it’s used quite a lot in Qt, so this is no easy job and will certainly
> break source compatibility in places such as the XML stream reader.
> 
> >> Q: Why size_t?
> >>
> >> 
> >>
> >> A: The intent of QStringView (and std::experimental::string_view) is to
> >>
> >>act
> >>
> >> as an interface between modules written with different compilers and
> >> different flags. A std::string will never be compatible between
> >>
> >>compilers
> >>
> >> or even just different flags, but a simple struct {char*, size_t} will
> >> always be, by way of it's C compatibility.
> >>
> >> 
> >>
> >> So the goal is not just to accept QString, QStringRef, and (QChar*,int)
> >>
> >>(and
> >>
> >> QVarLengthArray<QChar>!) as input to QStringView, but also
> >> std::basic_string<char16_t> and std::vector<char16_t>.
> >
> >The C++ committee's current stance on signed vs unsigned is that you
> >should 
> >use signed for everything, except when you want to have modulo-2
> >overflows. 
> >We're not overflowing, so it should be signed.
> 
> Yes, signed please. We can discuss whether it should be 64bit for Qt 6.

The current std API uses size_t. Do you (= both of you) expect that ever to 
change? If it doesn't, Qt will forever be the odd one out, until we finally 
drop QVector etc for std::vector etc and then porting will be a horror because 
of MSVC's annoying warnings.

[...]
> >> Q: What about QLatin1String?
> >>
> >> 
> >>
> >> A: Once QString is backed by UTF-8, latin-1 ceases to be a special
> >>
> >>charset.
> >>
> >> We might want something like QUsAsciiString, but it would just be a
> >>
> >>UTF-8
> >>
> >> string, so it could be packed into QStringView.
> >
> >Since QString will not be backed by UTF-8, the answer is irrelevant.

I don't know who I spoke to at QtWS, but they made it sound differently, citing 
Mozilla as having converted back to UTF-8 after using a fixed-width string type 
for some time. And I found that reasonable. After all, most text _is_ us-
ascii, even in Chinese native applications, and seeing as memory bandwidth is 
the limiting factor these days and will continue to be for the forseeable 
future, a more compact string storage should win the day. Esp. for SSO, having 
possibly twice as much characters should go a long way to avoid allocations.

But that's another topic, too.

> Agree here as well. We can’t make QString utf-8 backed without breaking
> way too much code. I also don’t see the need for it. The native encoding
> on Windows and Mac (Cocoa) is utf-16 as well, on Linux it’s utf-8. So no
> matter which platform we’re on, we won’t avoid some conversions.
> 
> And I will strongly oppose any attempts to make QString some sort of
> hybrid supporting both. The added complexity in maintaining the code base
> is simply not worth it.
> 
> >> Q: What about QByteArray, QVector?
> >>
> >> 
> >>
> >> A: I'm unsure about QByteArrayView. It might not pull its weight
> >>
> >>compared to
> >>
> >> std::(experimental::)string_view, but I also note that we're currently
> >> missing a QByteArrayRef, so a QBAView might make sense while we wait for
> >> the std one to become available to us.
> >
> >Given the mistakes that you and I are pointing out in QStringRef, we
> >should 
> >not add QByteArrayRef. Instead, it should be in the new-style, in which
> >case I 
> >wonder whether we should add a class in the first place. And moreover,
> >how 
> >often is this needed? std::array_view should be plenty for QByteArray and
> >QVector where needed.

array_view cannot compete with QByteArray's API. E.g. there's no toInt(). The 
need is there. I often see code that would benefit from a non-allocating Ref, 
but cannot because it's operating at the QBA level. So I do see a use-case for 
QByteArrayView. Or Ref.

> Agreed as well.
> 
> >> I'm actively opposed to a QArrayView, because I don't think it provides
> >>
> >>us
> >>
> >> with anything std::(experimental::)array_view doesn't already.
> >
> >Right.
> >
> >> Q: What do you mean when you say "abandon QString"?
> >>
> >> 
> >>
> >> A: I mean that functions should not take QStrings as arguments, but
> >> QStringViews. Then users can transparently pass QString, QStringRef and
> >>
> >>any
> >>
> >> of a number of other "string" types without overloading the function on
> >> each of them.
> >>
> >> 
> >>
> >> I do not mean to abandon QString, the class. Only QString, the interface
> >> type.
> >
> >I'm not agreeing to the proposal just yet.
> >
> >But as a condition to be even considered, it needs to be only for the
> >methods 
> >that do not hold a copy of the string. That is, methods that immediately
> >consume the string and no longer need to reference its contents.

Thiago, I think it would help the discussion if you quickly summarised your 
planned changes to QString in Qt 6.

AFAIK, the size and offset will move into the object, so I expected that 
Q6String would subsume QStringRef, because each QString could provide a 
separate view on the shared underlying data. I also was led to believe that 
Q6String would use SSO, which, given its inceased sizeof(), would make a lot 
of sense, imo.

And then I thought, QString would be converted to hold UTF-8. I saw 
wip/qstring-utf8 fly by on gerrit, but ok, that hasn't received any updates 
since 2012.

> >Methods that keep a copy for any reason (e.g., QFile::setFilename) should
> >still keep a QString API so that they can participate in the reference
> >counting.
> 
> Yes, we can’t do it differently. That immediately brings up the problem of
> how to make things future proof. Suppose we have an API that takes a
> QStringView because we’re not taking a copy of the string. Two minor
> releases later we find out that we need a copy for some reason
> nevertheless. What do we do?

We take a deep copy and measure. If and only if there's a benefit in real-world 
applications, we can add a QString overload back.

Has anyone, ever, measured the average data sharing from QString COW? Or the 
speedup? It should be simple: just add a detach() to the copy ctor and 
assignment operator, compile (in C++11!!) and benchmark memory 
use/fragmentation and cpu cycles used. Of, say, QtDesigner or QtCreator. Herb 
Sutter's benchmark is more than 15 years old, I wonder how it would behave 
under the increased spread of cache and main memory.

I once compiled a log of string operations for QtDesigner, but GCC didn't want 
to compile the resulting benchmark (stopped it after ~1h) and I don't have the 
data anymore.

> I’m not saying that a QStringView might not be a good idea, but we have to
> be careful where we use it. I’d say that for most cases we want to
> continue to pass a const QString & into the methods. QStringView would be
> reserved for performance critical parts of the API, which 90% of our API
> is not.

None of us know which part of the API is performance critical. We're 
developing a library, and we can't possibly know what all users are doing with 
it. Who's to say QDir isn't someone's bottleneck? And who's to say all users 
report this kind of stuff, if it happens to their app?

Take QDateTime as a warning. Take a 3x speedup in QString::multiArg() as a 
sign that every cloud has a silver lining.

To get back on topic: You focus on the drawbacks of one use-case (common as it 
may be) for a QString that is stored as-is in a class. You forget about all 
the conversions and allocations that are prevented by requiring QString 
everywhere. E.g. local QStrings could be replaced by QVarLengthArray<QChar>, 
e.g. in code like this:

    QString joinstr = QLatin1String("\"\n");
    joinstr += indent;
    joinstr += indent;
    joinstr += QLatin1Char('"');

    QString rc(QLatin1Char('"'));
    rc += result.join(joinstr);
    rc += QLatin1Char('"');

Even when replaced with QStringBuilder:

    QString joinstr = QLatin1String("\"\n") + indent + indent
                       + QLatin1Char('"');

    QString rc = QLatin1Char('"') + result.join(joinstr) + QLatin1Char('"');

But the main benefit would be that we only ever *need* one overload for a 
QString-taking function, not N, N > 3. The API becomes comprehensible again, 
while at the same time being near-maximally efficient[1].

And yes, this makes string handling in Qt even more complicated. Because 
there's yet another class to learn.

Or maybe not. Because there'd be (eventually) less overloads, less _forgotten_ 
overloads(!), simpler API, and _potential_ for effciency.   If users use 
QString everywhere and ignore QStringRef and QStringBuilder _now_, that's 
their problem, but it will continue to work. But if they hit a performance 
problem, it would be nice to know that the API doesn't stand in the way of 
fixing it.

In any case, I think by the time we prepare Qt 6, we will not have any 
resources to go through all of Qt and do a transformation like this. That's 
why I'd suggest to start introducing QStringView in Qt 5.7. Hopefully, by Qt 
6, we have gained some experience and data that show how it compares to 
QString-taking API, and how to go forward with it (or not).

Thanks,
Marc

[1] If you think the Qt string API isn't _that_ bad, please tell me:
  a) how many op+ are mssing when you don't compile with QT_USE_QSTRINGBUILDER
  b) how many QString::append() overloads are there (don't look!)?
  c) for A, B \in {QChar, QLatin1Char, QString, QLatin1String, QStringRef},
     does A == B compile? Is it efficient?

-- 
Marc Mutz <marc.mutz at kdab.com> | Senior Software Engineer
KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company
Tel: +49-30-521325470
KDAB - The Qt Experts