[Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings
Marc Mutz
marc.mutz at kdab.com
Wed Oct 14 12:16:47 CEST 2015
On Wednesday 14 October 2015 08:37:19 Knoll Lars wrote:
> I’m not a huge fan of having different overloads with QString, QStringRef
> and QLatin1String and in some cases (QChar *, int) for many methods
> neither. But while your proposal solves some problems it introduces others.
>
> A QStringView class would only work for methods that read the data
> contained in it, but don’t try to modify it or take a copy (as Thiago
> pointed out).
I do not agree with that statement.
First, afaiu from what Thiago mentions in reviews, Q6String will have SSO
(small-string-optimisation) which makes many short strings expensive to copy
(if you think that copying 24 bytes is slower than upping an atomic int
through an indirection) or cheap to copy (if you think the opposite). In any
case, small strings will be very cheap to create (no allocation), so for many
strings there will be not much difference between passing a QStringView or
passing a QString.
Second, upon modification, the QString will detach (make a copy), and _then_
perform the operation. With a QStringView and an efficient base of operations,
those two operations can be folded into one (basically as the const versions
of QString methods, where they exist, do (or should do)). Unless the operation
can and will actually be done in the original allocation (ie. incl. no
detach), the const methods should be faster. That will never be the case when
you pass QString by const-&, because there will always be the lvalue parameter
attached to the QString instance. For modification in-place to work, you need
to pass by rvalue ref. So for typical functions modifying the string, there's
also no difference between QString and QStringView.
That leaves classes which simply store the string. You cited QUrl. I don't see
a problem providing QString overloads for these, esp. considering that we're
starting out with an all-QString API here. Then again, once we have
QStringView overloads, we can simply disable the QString overloads and see the
effect.
BTW: functions storing a passed QString as-is should provide a QString&&
overload, and that might be a good idea even when otherwise using QStringView
only.
> And you certainly can’t keep the pointer to the data around
> for longer than the lifetime of the QStringView, so it’s to some extent an
> advanced class you have to be careful when using in your own APIs.
It's like the distinction between QModelIndex and QPersistentModelIndex. The
first is an interface type, the latter the storage type. Neither is more
"advanced" than the other. They are complements.
> So it can work nicely for methods such as QString::indexOf and similar,
> but will never be good for methods that need to copy the string (e.g.
> QUrl::setHostName).
>
>
> Another thing I wonder about is whether we shouldn’t deprecate
> QLatin1String moving forward. We have QStringLiteral, and even though it’s
> implementation is not ideal, we should be able to get it working
> everywhere now with Qt 5.7. Let’s think about how and whether we can
> improve it’s implementation to fix the remaining issues. Then we could
> remove/deprecate QLatin1String.
There are problems in QStringLiteral that cannot be solved. Common data
sharing will never happen with the current syntax. I'd suggest a
QStaticString, a fully constexpr wrapper around QStaticStringData<N>,
basically to determine the N transparently, which can be used as a variable at
namespace scope in lieu of the current need to pack all QStringLiterals into
static inline functions. But that's outside the scope of this thread, so let's
not go there.
> On 13/10/15 23:01, "Thiago Macieira" <thiago.macieira at intel.com> wrote:
> >On Tuesday 13 October 2015 22:46:36 Marc Mutz wrote:
> >> Q: What mistakes do you refer to?
> >>
> >>
> >>
> >> A: The fact that it has copy ctor and assignment operator, so it's not a
> >> trivally-copyable type and thus cannot efficiently passed by-value. It
> >>
> >>may
> >>
> >> also be too large for pass-by-value due to the rather useless QString
> >> pointer (should have been QStringData*, if any). Neither can be fixed
> >> before Qt 6.
> >
> >Not even in Qt 6. The reason why it uses a QString pointer is that it
> >follows
> >the QString through reallocations. If the QString is mutated, the
> >QStringRef
> >will still be valid (provided it isn't shortened beyond the substring the
> >QStringRef points to). There's a lot of code that depends on this, so we
> >can't
> >change it.
QString foo = "foo";
QStringRef ref = foo.midRef(1); // ref == "oo";
foo = "bar"; // oops, ref == "ar";
We could change it to hold QString::Data* instead, though, right? And make it
share ownership of the QString::Data, in which case we have a QString that has
position and size inline. Or, if it doesn't participate in the ownership, we
can start returning QStringRef from QStringLiteral(Ref?), killing one major
QSL problem (out-of-line QString dtor litter).
> Only by deprecating QStringRef and not using it ourselves anymore. But
> it’s used quite a lot in Qt, so this is no easy job and will certainly
> break source compatibility in places such as the XML stream reader.
>
> >> Q: Why size_t?
> >>
> >>
> >>
> >> A: The intent of QStringView (and std::experimental::string_view) is to
> >>
> >>act
> >>
> >> as an interface between modules written with different compilers and
> >> different flags. A std::string will never be compatible between
> >>
> >>compilers
> >>
> >> or even just different flags, but a simple struct {char*, size_t} will
> >> always be, by way of it's C compatibility.
> >>
> >>
> >>
> >> So the goal is not just to accept QString, QStringRef, and (QChar*,int)
> >>
> >>(and
> >>
> >> QVarLengthArray<QChar>!) as input to QStringView, but also
> >> std::basic_string<char16_t> and std::vector<char16_t>.
> >
> >The C++ committee's current stance on signed vs unsigned is that you
> >should
> >use signed for everything, except when you want to have modulo-2
> >overflows.
> >We're not overflowing, so it should be signed.
>
> Yes, signed please. We can discuss whether it should be 64bit for Qt 6.
The current std API uses size_t. Do you (= both of you) expect that ever to
change? If it doesn't, Qt will forever be the odd one out, until we finally
drop QVector etc for std::vector etc and then porting will be a horror because
of MSVC's annoying warnings.
[...]
> >> Q: What about QLatin1String?
> >>
> >>
> >>
> >> A: Once QString is backed by UTF-8, latin-1 ceases to be a special
> >>
> >>charset.
> >>
> >> We might want something like QUsAsciiString, but it would just be a
> >>
> >>UTF-8
> >>
> >> string, so it could be packed into QStringView.
> >
> >Since QString will not be backed by UTF-8, the answer is irrelevant.
I don't know who I spoke to at QtWS, but they made it sound differently, citing
Mozilla as having converted back to UTF-8 after using a fixed-width string type
for some time. And I found that reasonable. After all, most text _is_ us-
ascii, even in Chinese native applications, and seeing as memory bandwidth is
the limiting factor these days and will continue to be for the forseeable
future, a more compact string storage should win the day. Esp. for SSO, having
possibly twice as much characters should go a long way to avoid allocations.
But that's another topic, too.
> Agree here as well. We can’t make QString utf-8 backed without breaking
> way too much code. I also don’t see the need for it. The native encoding
> on Windows and Mac (Cocoa) is utf-16 as well, on Linux it’s utf-8. So no
> matter which platform we’re on, we won’t avoid some conversions.
>
> And I will strongly oppose any attempts to make QString some sort of
> hybrid supporting both. The added complexity in maintaining the code base
> is simply not worth it.
>
> >> Q: What about QByteArray, QVector?
> >>
> >>
> >>
> >> A: I'm unsure about QByteArrayView. It might not pull its weight
> >>
> >>compared to
> >>
> >> std::(experimental::)string_view, but I also note that we're currently
> >> missing a QByteArrayRef, so a QBAView might make sense while we wait for
> >> the std one to become available to us.
> >
> >Given the mistakes that you and I are pointing out in QStringRef, we
> >should
> >not add QByteArrayRef. Instead, it should be in the new-style, in which
> >case I
> >wonder whether we should add a class in the first place. And moreover,
> >how
> >often is this needed? std::array_view should be plenty for QByteArray and
> >QVector where needed.
array_view cannot compete with QByteArray's API. E.g. there's no toInt(). The
need is there. I often see code that would benefit from a non-allocating Ref,
but cannot because it's operating at the QBA level. So I do see a use-case for
QByteArrayView. Or Ref.
> Agreed as well.
>
> >> I'm actively opposed to a QArrayView, because I don't think it provides
> >>
> >>us
> >>
> >> with anything std::(experimental::)array_view doesn't already.
> >
> >Right.
> >
> >> Q: What do you mean when you say "abandon QString"?
> >>
> >>
> >>
> >> A: I mean that functions should not take QStrings as arguments, but
> >> QStringViews. Then users can transparently pass QString, QStringRef and
> >>
> >>any
> >>
> >> of a number of other "string" types without overloading the function on
> >> each of them.
> >>
> >>
> >>
> >> I do not mean to abandon QString, the class. Only QString, the interface
> >> type.
> >
> >I'm not agreeing to the proposal just yet.
> >
> >But as a condition to be even considered, it needs to be only for the
> >methods
> >that do not hold a copy of the string. That is, methods that immediately
> >consume the string and no longer need to reference its contents.
Thiago, I think it would help the discussion if you quickly summarised your
planned changes to QString in Qt 6.
AFAIK, the size and offset will move into the object, so I expected that
Q6String would subsume QStringRef, because each QString could provide a
separate view on the shared underlying data. I also was led to believe that
Q6String would use SSO, which, given its inceased sizeof(), would make a lot
of sense, imo.
And then I thought, QString would be converted to hold UTF-8. I saw
wip/qstring-utf8 fly by on gerrit, but ok, that hasn't received any updates
since 2012.
> >Methods that keep a copy for any reason (e.g., QFile::setFilename) should
> >still keep a QString API so that they can participate in the reference
> >counting.
>
> Yes, we can’t do it differently. That immediately brings up the problem of
> how to make things future proof. Suppose we have an API that takes a
> QStringView because we’re not taking a copy of the string. Two minor
> releases later we find out that we need a copy for some reason
> nevertheless. What do we do?
We take a deep copy and measure. If and only if there's a benefit in real-world
applications, we can add a QString overload back.
Has anyone, ever, measured the average data sharing from QString COW? Or the
speedup? It should be simple: just add a detach() to the copy ctor and
assignment operator, compile (in C++11!!) and benchmark memory
use/fragmentation and cpu cycles used. Of, say, QtDesigner or QtCreator. Herb
Sutter's benchmark is more than 15 years old, I wonder how it would behave
under the increased spread of cache and main memory.
I once compiled a log of string operations for QtDesigner, but GCC didn't want
to compile the resulting benchmark (stopped it after ~1h) and I don't have the
data anymore.
> I’m not saying that a QStringView might not be a good idea, but we have to
> be careful where we use it. I’d say that for most cases we want to
> continue to pass a const QString & into the methods. QStringView would be
> reserved for performance critical parts of the API, which 90% of our API
> is not.
None of us know which part of the API is performance critical. We're
developing a library, and we can't possibly know what all users are doing with
it. Who's to say QDir isn't someone's bottleneck? And who's to say all users
report this kind of stuff, if it happens to their app?
Take QDateTime as a warning. Take a 3x speedup in QString::multiArg() as a
sign that every cloud has a silver lining.
To get back on topic: You focus on the drawbacks of one use-case (common as it
may be) for a QString that is stored as-is in a class. You forget about all
the conversions and allocations that are prevented by requiring QString
everywhere. E.g. local QStrings could be replaced by QVarLengthArray<QChar>,
e.g. in code like this:
QString joinstr = QLatin1String("\"\n");
joinstr += indent;
joinstr += indent;
joinstr += QLatin1Char('"');
QString rc(QLatin1Char('"'));
rc += result.join(joinstr);
rc += QLatin1Char('"');
Even when replaced with QStringBuilder:
QString joinstr = QLatin1String("\"\n") + indent + indent
+ QLatin1Char('"');
QString rc = QLatin1Char('"') + result.join(joinstr) + QLatin1Char('"');
But the main benefit would be that we only ever *need* one overload for a
QString-taking function, not N, N > 3. The API becomes comprehensible again,
while at the same time being near-maximally efficient[1].
And yes, this makes string handling in Qt even more complicated. Because
there's yet another class to learn.
Or maybe not. Because there'd be (eventually) less overloads, less _forgotten_
overloads(!), simpler API, and _potential_ for effciency. If users use
QString everywhere and ignore QStringRef and QStringBuilder _now_, that's
their problem, but it will continue to work. But if they hit a performance
problem, it would be nice to know that the API doesn't stand in the way of
fixing it.
In any case, I think by the time we prepare Qt 6, we will not have any
resources to go through all of Qt and do a transformation like this. That's
why I'd suggest to start introducing QStringView in Qt 5.7. Hopefully, by Qt
6, we have gained some experience and data that show how it compares to
QString-taking API, and how to go forward with it (or not).
Thanks,
Marc
[1] If you think the Qt string API isn't _that_ bad, please tell me:
a) how many op+ are mssing when you don't compile with QT_USE_QSTRINGBUILDER
b) how many QString::append() overloads are there (don't look!)?
c) for A, B \in {QChar, QLatin1Char, QString, QLatin1String, QStringRef},
does A == B compile? Is it efficient?
--
Marc Mutz <marc.mutz at kdab.com> | Senior Software Engineer
KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company
Tel: +49-30-521325470
KDAB - The Qt Experts
More information about the Development
mailing list