[Development] RFC: Proposal for a semi-radical change in Qt APIs taking strings

Knoll Lars Lars.Knoll at theqtcompany.com
Wed Oct 14 13:15:55 CEST 2015


On 14/10/15 12:16, "Marc Mutz" <marc.mutz at kdab.com> wrote:

>On Wednesday 14 October 2015 08:37:19 Knoll Lars wrote:
>> I’m not a huge fan of having different overloads with QString,
>>QStringRef
>> and QLatin1String and in some cases (QChar *, int) for many methods
>> neither. But while your proposal solves some problems it introduces
>>others.
>> 
>> A QStringView class would only work for methods that read the data
>> contained in it, but don’t try to modify it or take a copy (as Thiago
>> pointed out).
>
>I do not agree with that statement.
>
>First, afaiu from what Thiago mentions in reviews, Q6String will have SSO
>(small-string-optimisation) which makes many short strings expensive to
>copy 
>(if you think that copying 24 bytes is slower than upping an atomic int
>through an indirection) or cheap to copy (if you think the opposite). In
>any 
>case, small strings will be very cheap to create (no allocation), so for
>many 
>strings there will be not much difference between passing a QStringView
>or 
>passing a QString.

I think Thiago should expand a bit on hit plans so we see how the
different pieces will fit together.
>
>Second, upon modification, the QString will detach (make a copy), and
>_then_ 
>perform the operation. With a QStringView and an efficient base of
>operations, 
>those two operations can be folded into one (basically as the const
>versions 
>of QString methods, where they exist, do (or should do)). Unless the
>operation 
>can and will actually be done in the original allocation (ie. incl. no
>detach), the const methods should be faster. That will never be the case
>when 
>you pass QString by const-&, because there will always be the lvalue
>parameter 
>attached to the QString instance. For modification in-place to work, you
>need 
>to pass by rvalue ref. So for typical functions modifying the string,
>there's 
>also no difference between QString and QStringView.

Agreed. As soon as you modify the string, it makes no difference, unless
maybe you at the same time hand over the string to the called function.
>
>That leaves classes which simply store the string. You cited QUrl. I
>don't see 
>a problem providing QString overloads for these, esp. considering that
>we're 
>starting out with an all-QString API here. Then again, once we have
>QStringView overloads, we can simply disable the QString overloads and
>see the 
>effect.

I think there’s actually quite a few of those. In addition, it might be
tricky to use QStringView in signals and slots.
>
>BTW: functions storing a passed QString as-is should provide a QString&&
>overload, and that might be a good idea even when otherwise using
>QStringView 
>only.

Yes, agree with this.
>
>> And you certainly can’t keep the pointer to the data around
>> for longer than the lifetime of the QStringView, so it’s to some extent
>>an
>> advanced class you have to be careful when using in your own APIs.
>
>It's like the distinction between QModelIndex and QPersistentModelIndex.
>The 
>first is an interface type, the latter the storage type. Neither is more
>"advanced" than the other. They are complements.
> 
>> So it can work nicely for methods such as QString::indexOf and similar,
>> but will never be good for methods that need to copy the string (e.g.
>> QUrl::setHostName).
>> 
>> 
>> Another thing I wonder about is whether we shouldn’t deprecate
>> QLatin1String moving forward. We have QStringLiteral, and even though
>>it’s
>> implementation is not ideal, we should be able to get it working
>> everywhere now with Qt 5.7. Let’s think about how and whether we can
>> improve it’s implementation to fix the remaining issues. Then we could
>> remove/deprecate QLatin1String.
>
>There are problems in QStringLiteral that cannot be solved. Common data
>sharing will never happen with the current syntax. I'd suggest a
>QStaticString, a fully constexpr wrapper around QStaticStringData<N>,
>basically to determine the N transparently, which can be used as a
>variable at 
>namespace scope in lieu of the current need to pack all QStringLiterals
>into 
>static inline functions. But that's outside the scope of this thread, so
>let's 
>not go there.

Yes, that’s what I originally wanted with QStringLiteral. Unfortunately
it’s semantics then got changed to return a full QString object.

>
>> On 13/10/15 23:01, "Thiago Macieira" <thiago.macieira at intel.com> wrote:
>> >On Tuesday 13 October 2015 22:46:36 Marc Mutz wrote:
>> >> Q: What mistakes do you refer to?
>> >>
>> >> 
>> >>
>> >> A: The fact that it has copy ctor and assignment operator, so it's
>>not a
>> >> trivally-copyable type and thus cannot efficiently passed by-value.
>>It
>> >>
>> >>may
>> >>
>> >> also be too large for pass-by-value due to the rather useless QString
>> >> pointer (should have been QStringData*, if any). Neither can be fixed
>> >> before Qt 6.
>> >
>> >Not even in Qt 6. The reason why it uses a QString pointer is that it
>> >follows 
>> >the QString through reallocations. If the QString is mutated, the
>> >QStringRef 
>> >will still be valid (provided it isn't shortened beyond the substring
>>the
>> >QStringRef points to). There's a lot of code that depends on this, so
>>we
>> >can't 
>> >change it.
>
>QString foo = "foo";
>QStringRef ref = foo.midRef(1); // ref == "oo";
>foo = "bar"; // oops, ref == "ar";

Agree with Marc here, that this is not good/obvious behavior, and we
actually shouldn’t support it.
>
>We could change it to hold QString::Data* instead, though, right? And
>make it

Not in Qt 5, as the m_string pointer is accessed inline.

> 
>share ownership of the QString::Data, in which case we have a QString
>that has 
>position and size inline. Or, if it doesn't participate in the ownership,
>we

If we’re talking Qt6, this might not be an issue at all depending on how
Thiago’s SSO plans are. If position and size are stored in QString, not
the QStringData, we could simply merge QString and QStringRef.

> 
>can start returning QStringRef from QStringLiteral(Ref?), killing one
>major 
>QSL problem (out-of-line QString dtor litter).
>
>> Only by deprecating QStringRef and not using it ourselves anymore. But
>> it’s used quite a lot in Qt, so this is no easy job and will certainly
>> break source compatibility in places such as the XML stream reader.
>> 
>> >> Q: Why size_t?
>> >>
>> >> 
>> >>
>> >> A: The intent of QStringView (and std::experimental::string_view) is
>>to
>> >>
>> >>act
>> >>
>> >> as an interface between modules written with different compilers and
>> >> different flags. A std::string will never be compatible between
>> >>
>> >>compilers
>> >>
>> >> or even just different flags, but a simple struct {char*, size_t}
>>will
>> >> always be, by way of it's C compatibility.
>> >>
>> >> 
>> >>
>> >> So the goal is not just to accept QString, QStringRef, and
>>(QChar*,int)
>> >>
>> >>(and
>> >>
>> >> QVarLengthArray<QChar>!) as input to QStringView, but also
>> >> std::basic_string<char16_t> and std::vector<char16_t>.
>> >
>> >The C++ committee's current stance on signed vs unsigned is that you
>> >should 
>> >use signed for everything, except when you want to have modulo-2
>> >overflows. 
>> >We're not overflowing, so it should be signed.
>> 
>> Yes, signed please. We can discuss whether it should be 64bit for Qt 6.
>
>The current std API uses size_t. Do you (= both of you) expect that ever
>to 
>change? If it doesn't, Qt will forever be the odd one out, until we
>finally 
>drop QVector etc for std::vector etc and then porting will be a horror
>because 
>of MSVC's annoying warnings.

Using unsigned will give us exactly the same problem now. So I don’t see
how this would change anything except timing in this scenario.
>
>[...]
>> >> Q: What about QLatin1String?
>> >>
>> >> 
>> >>
>> >> A: Once QString is backed by UTF-8, latin-1 ceases to be a special
>> >>
>> >>charset.
>> >>
>> >> We might want something like QUsAsciiString, but it would just be a
>> >>
>> >>UTF-8
>> >>
>> >> string, so it could be packed into QStringView.
>> >
>> >Since QString will not be backed by UTF-8, the answer is irrelevant.
>
>I don't know who I spoke to at QtWS, but they made it sound differently,
>citing 
>Mozilla as having converted back to UTF-8 after using a fixed-width
>string type 
>for some time. 

Mozilla actually had 5-7 different string classes for many years IIRC.
That’s certainly something I don’t want to do in Qt.

>And I found that reasonable. After all, most text _is_ us-
>ascii, even in Chinese native applications, and seeing as memory
>bandwidth is 
>the limiting factor these days and will continue to be for the forseeable
>future, a more compact string storage should win the day. Esp. for SSO,
>having 
>possibly twice as much characters should go a long way to avoid
>allocations.

It’s worthwhile discussing, but any such change would have huge
implications on our QString API. In any case, it’s nothing we can do in Qt
5.
>
>But that's another topic, too.

Agreed.
>
>> Agree here as well. We can’t make QString utf-8 backed without breaking
>> way too much code. I also don’t see the need for it. The native encoding
>> on Windows and Mac (Cocoa) is utf-16 as well, on Linux it’s utf-8. So no
>> matter which platform we’re on, we won’t avoid some conversions.
>> 
>> And I will strongly oppose any attempts to make QString some sort of
>> hybrid supporting both. The added complexity in maintaining the code
>>base
>> is simply not worth it.
>> 
>> >> Q: What about QByteArray, QVector?
>> >>
>> >> 
>> >>
>> >> A: I'm unsure about QByteArrayView. It might not pull its weight
>> >>
>> >>compared to
>> >>
>> >> std::(experimental::)string_view, but I also note that we're
>>currently
>> >> missing a QByteArrayRef, so a QBAView might make sense while we wait
>>for
>> >> the std one to become available to us.
>> >
>> >Given the mistakes that you and I are pointing out in QStringRef, we
>> >should 
>> >not add QByteArrayRef. Instead, it should be in the new-style, in which
>> >case I 
>> >wonder whether we should add a class in the first place. And moreover,
>> >how 
>> >often is this needed? std::array_view should be plenty for QByteArray
>>and
>> >QVector where needed.
>
>array_view cannot compete with QByteArray's API. E.g. there's no toInt().
>The 
>need is there. I often see code that would benefit from a non-allocating
>Ref, 
>but cannot because it's operating at the QBA level. So I do see a
>use-case for 
>QByteArrayView. Or Ref.
>
>> Agreed as well.
>> 
>> >> I'm actively opposed to a QArrayView, because I don't think it
>>provides
>> >>
>> >>us
>> >>
>> >> with anything std::(experimental::)array_view doesn't already.
>> >
>> >Right.
>> >
>> >> Q: What do you mean when you say "abandon QString"?
>> >>
>> >> 
>> >>
>> >> A: I mean that functions should not take QStrings as arguments, but
>> >> QStringViews. Then users can transparently pass QString, QStringRef
>>and
>> >>
>> >>any
>> >>
>> >> of a number of other "string" types without overloading the function
>>on
>> >> each of them.
>> >>
>> >> 
>> >>
>> >> I do not mean to abandon QString, the class. Only QString, the
>>interface
>> >> type.
>> >
>> >I'm not agreeing to the proposal just yet.
>> >
>> >But as a condition to be even considered, it needs to be only for the
>> >methods 
>> >that do not hold a copy of the string. That is, methods that
>>immediately
>> >consume the string and no longer need to reference its contents.
>
>Thiago, I think it would help the discussion if you quickly summarised
>your 
>planned changes to QString in Qt 6.

+1.
>
>AFAIK, the size and offset will move into the object, so I expected that
>Q6String would subsume QStringRef, because each QString could provide a
>separate view on the shared underlying data. I also was led to believe
>that 
>Q6String would use SSO, which, given its inceased sizeof(), would make a
>lot 
>of sense, imo.

If QString would anyway be large enough and we can encode the information
on whether it’s inline without using additional storage, then yes, that
could make a lot of sense.
>
>And then I thought, QString would be converted to hold UTF-8. I saw
>wip/qstring-utf8 fly by on gerrit, but ok, that hasn't received any
>updates 
>since 2012.
>
>> >Methods that keep a copy for any reason (e.g., QFile::setFilename)
>>should
>> >still keep a QString API so that they can participate in the reference
>> >counting.
>> 
>> Yes, we can’t do it differently. That immediately brings up the problem
>>of
>> how to make things future proof. Suppose we have an API that takes a
>> QStringView because we’re not taking a copy of the string. Two minor
>> releases later we find out that we need a copy for some reason
>> nevertheless. What do we do?
>
>We take a deep copy and measure. If and only if there's a benefit in
>real-world 
>applications, we can add a QString overload back.
>
>Has anyone, ever, measured the average data sharing from QString COW? Or
>the 
>speedup? It should be simple: just add a detach() to the copy ctor and
>assignment operator, compile (in C++11!!) and benchmark memory
>use/fragmentation and cpu cycles used. Of, say, QtDesigner or QtCreator.
>Herb 
>Sutter's benchmark is more than 15 years old, I wonder how it would
>behave 
>under the increased spread of cache and main memory.
>
>I once compiled a log of string operations for QtDesigner, but GCC didn't
>want 
>to compile the resulting benchmark (stopped it after ~1h) and I don't
>have the 
>data anymore.
>
>> I’m not saying that a QStringView might not be a good idea, but we have
>>to
>> be careful where we use it. I’d say that for most cases we want to
>> continue to pass a const QString & into the methods. QStringView would
>>be
>> reserved for performance critical parts of the API, which 90% of our API
>> is not.
>
>None of us know which part of the API is performance critical. We're
>developing a library, and we can't possibly know what all users are doing
>with 
>it. Who's to say QDir isn't someone's bottleneck? And who's to say all
>users 
>report this kind of stuff, if it happens to their app?

Of course we don’t know all it’s uses. But many uses outside of QtCore are
clearly less critical. QLineEdit::setText is clearly not called in tight
loops, and once you set the text it has to do lots of other work. There
are many similar APIs in Qt, where I don’t think we’ll ever see a benefit
of a QStringView, and the simplicity of passing in a const QString ref is
probably preferrable.
>
>Take QDateTime as a warning. Take a 3x speedup in QString::multiArg() as
>a 
>sign that every cloud has a silver lining.

I’ve done enough of these optimizations myself, yes. And I agree with you
that we need to be extremely careful with all our low level classes. But I
don’t think you’ll see any measurable difference when we go to the API of
high level classes such as QFileDialog.
>
>To get back on topic: You focus on the drawbacks of one use-case (common
>as it 
>may be) for a QString that is stored as-is in a class. You forget about
>all 
>the conversions and allocations that are prevented by requiring QString
>everywhere. E.g. local QStrings could be replaced by
>QVarLengthArray<QChar>,
>e.g. in code like this:
>
>    QString joinstr = QLatin1String("\"\n");
>    joinstr += indent;
>    joinstr += indent;
>    joinstr += QLatin1Char('"');
>
>    QString rc(QLatin1Char('"'));
>    rc += result.join(joinstr);
>    rc += QLatin1Char('"');
>
>Even when replaced with QStringBuilder:
>
>    QString joinstr = QLatin1String("\"\n") + indent + indent
>                       + QLatin1Char('"');
>
>    QString rc = QLatin1Char('"') + result.join(joinstr) +
>QLatin1Char('"');
>
>But the main benefit would be that we only ever *need* one overload for a
>QString-taking function, not N, N > 3. The API becomes comprehensible
>again, 
>while at the same time being near-maximally efficient[1].

I don’t think we’re disagreeing here. I do not like the amount of
overloads we have neither.
>
>And yes, this makes string handling in Qt even more complicated. Because
>there's yet another class to learn.

It doesn’t have to as you’re saying yourself below. All I want is that we

* look at this and find good rules on when to use which class in our APIs
* Take what we know about how things will look in the future into account
(ie. Qt6)
>
>Or maybe not. Because there'd be (eventually) less overloads, less
>_forgotten_ 
>overloads(!), simpler API, and _potential_ for effciency.   If users use
>QString everywhere and ignore QStringRef and QStringBuilder _now_, that's
>their problem, but it will continue to work. But if they hit a
>performance 
>problem, it would be nice to know that the API doesn't stand in the way
>of 
>fixing it.
>
>In any case, I think by the time we prepare Qt 6, we will not have any
>resources to go through all of Qt and do a transformation like this.
>That's 
>why I'd suggest to start introducing QStringView in Qt 5.7. Hopefully, by
>Qt 
>6, we have gained some experience and data that show how it compares to
>QString-taking API, and how to go forward with it (or not).

I am certainly in favor of experimenting with this. Let’s start in a
branch or behind an ifdef. Once we gained some confidence let’s make it
public and start using it.
>
>Thanks,
>Marc
>
>[1] If you think the Qt string API isn't _that_ bad, please tell me:
>  a) how many op+ are mssing when you don't compile with
>QT_USE_QSTRINGBUILDER

Yes, that’s stupid. It should have been enforced with Qt 5.0 :(

>  b) how many QString::append() overloads are there (don't look!)?

Way too many ;-)

Again, I’m agreeing with you

>  c) for A, B \in {QChar, QLatin1Char, QString, QLatin1String,
>QStringRef},
>     does A == B compile? Is it efficient?

Chars should be different than strings. Unfortunately we do have
constructors for QString’s from a QChar.

In any case, if we’re talking QString and Qt 6, I think we should explore
whether we can:

* get rid of QLatin1String somehow
* merge QString and QStringRef (possible if pos and size are in QString)
* check how we can do small string optimizations
* have a QStringLiteral done right as a replacement for const char
*/QL1String
* explore how to handle normalization forms, string equility and related
questions

A QStringView will probably help with some of these items.

Cheers,
Lars



More information about the Development mailing list