[Development] QAnyStringView

Wed Jun 24 11:09:11 CEST 2020

> On 24 Jun 2020, at 09:32, Marc Mutz via Development <development at qt-project.org> wrote:
> 
> Hi Thiago,
> 
> On 2020-06-24 02:36, Thiago Macieira wrote:
>> On Tuesday, 23 June 2020 02:35:05 PDT Marc Mutz via Development wrote:
>>> I have come to believe that QUtf8StringView without QAnyStringView won't
>>> fly: Introducing QUtf8StringView without QAnyStringView will explode the
>>> number of mixed-type operations we need to support.
>> Question, what are the "mixed-typed operations we need to support?". Where do
>> you see the need for this?
> 
> QString::replace(), relational operators, QXmlStream has several combinatorially-explosive overload sets, too.

Some of those have rather large overload sets, and I agree that we want to remove them, and QAnyStringView seems to a way to do this.

I think Thiago is asking in which parts of Qt we need this. For most of the API that we e.g. have in widgets and other places, I think the current QString only API is fine, and I don’t think we should start touching that. Besides, it would simply not be doable in the time frame we have for 6.0.

So let’s focus on the cases where we have large overload sets, and where QAnyStringView has a large positive impact for our users.

>>> The best we can do to condense this down is
>>> to revoke string-ness of QByteArray and we'd be left with
>>> - QStringView
>>> - QLatin1String
>>> - QUtf8StringView
>>> - QChar
>> Aside from places where an exception is worth it, our string API should:
>> - take QString by const-ref
>> - return QString by value
>> That condenses our four types to one for almost the entirety of Qt.
> 
> In times of QtMCU, we need to re-think whether owning containers use in APIs is really the way to go. In C, strings are const char*, which is a view. In Qt 3, the objectName ctor argument was a const char*. Things have gone downhill since then for no good reason (ie. no functionality was added), and a test like tst_qstatemachine adding 10KiB in text size for O(150) setObjectName() calls' forced creation of QStrings temps is just premature pessimisation at its best.

setObjectName() taking a const char * had its large share of issues in terms of memory management and ownership. I’m certain you don’t want to go back there.
> 
> My Qt 5-era changes to QLatin1String (adding QL1S::arg(), enabling QL1S as a type for date/time formats, overloading QtJSON functions for QL1S) have shown how dramatic the effect of 8-bit string values without constructing a QString first really is, without actually limiting functionality.
> 
> QStringView, QStringTokenizer, QUtf8StringView and QAnyStringView are the result of a multi-year analysis of string processing with Qt. I'm fully convinced they're the way to go for ease of use and performance at the same time. There's one and only one technical reason to continue to use QString in APIs: lack of implementation bandwidth.

I’m still not convinced that a global change would be beneficial. But let’s not waste time on this discussion right now. As you noted, even if we had consensus on doing this (we don’t currently), there is no way we could implement that change for 6.0. 

Since we are severely limited in implementation bandwidth, let’s focus on those places where we all agree that we add value and/or simplify maintenance by using QAnyStringView. This is by itself limiting this for the most part to Qt Core (with maybe a few exceptions outside, where something like QPainter::drawText() comes to my mind).

>> For new API that benefits from the exception, I'd reduce to two:
>> - QStringView
>> - QUtf8StringView
>> But the fact that you listed QChar in the first place indicates that you're
>> talking about the string classes themselves. Nothing else uses QChar in our
>> API. In that case, yes, QLatin1String and QChar are part of the overload set.
> [...]
> 
> I guess you consider QStringList::join() a string class, then. But otherwise, yes, I agree.

QString and related classes. So classes like QLocale and conversion from string in QDateTime are certainly related to string handling.
> 
>>> Assuming for the sake of argument that we need those four types,
>>> consider QString::replace(). Experience shows that stuff like
>>> QStringBuilder expressions being passed will require an actual QString
>>> overload to be present, too. Ignoring existing overloads and regexp,
>>> we'd need 5x5=25 overloads. I won't enumerate them here. What I will
>>> enumerate is the complete set of overloads when using QAnyStringView:
>>>    QString& QString::replace(QAnyStringView, QAnyStringView,
>>> Qt::CaseSensitivity);
>>> That's it.

Reducing the overload set to one method is the thing I like most about QAnyStringView. It significantly simplifies our live, and we can remove QL1S overloads pretty much anywhere.

>>> Unlike QStringView, QAnyStringView is a pure interface type. I won't add
>>> much in the way of parsing API to it, even though I acknowledge that's a
>>> slippery slope. While it would be easy to add trimmed(), and tokenize()
>>> would be really interesting, QAnyStringView should not be used for
>>> parsing. That's what we have the three non-variant string view types
>>> for. Being a pure interface type means we can add more "dangerous"
>>> conversions. QStringView can't be constructed from a QStringBuilder,
>>> e.g., because it's almost impossible to make that work without
>>> referencing destroyed data:
>>>    QStringView s = u'c' + QString::number(x); // oops
>>>    QString c = u'c' + QString::number(x);
>>>    QStringView s = c; // ok
>>> But QAnyStringView supports this:
>>>    str.replace(name, name % "_1");
>> That's not the same code. In one you're creating a view object and accessing
>> it later outside of the same statement; in the other, it is created and
>> accessed in the same statement. That is to say, the following works:
>>  void foo(QStringView str);
>>  foo(u'c' + QString::number(x));
>> and the following doesn't:
>> QAnyStringView s = u'c' + QString::number(x);
> 
> That's what I've been trying to say: since Q(Utf8)StringView is very good at parsing, QAnyStringView is very good at being an interface type and an interface type only. As such, we can allow ourselves some leeway in what (implicit) conversions we add to QAnyStringView that we actively rejected for QStringView.
> 
>>> QAnyStringView solves this in the sense that one overload can replace
>>> many overloads. The complexity is still there, a binary visitation of a
>>> QAnyStringView produces nine instantiations of the visitor (though that
>>> can be reduced to six in many cases), but many implementations fall into
>>> one of just two classes: 1) a function would just call toString() on the
>>> any-string-view, anyway, in which case the QString construction is taken
>>> out of user code and centralized in the library. If you think that
>>> doesn't matter, look at the tst_qstatemachine numbers in
>>>   https://codereview.qt-project.org/c/qt/qtbase/+/301595 (-10KiB just
>>> from temporary QString creation and destruction)
>> I'm leaning towards agreeing to use QAnyStringView in the string classes.
> 
> The part you're replying to is, however, a case of a traditional QString-only function: setObjectName().
> 
>> I'll remove my -2.
> 
> Thanks for that!
> 
>>> 2) the complexity is already there and QAnyStringView helps in reducing
>>> it:
>>>   https://codereview.qt-project.org/c/qt/qtbase/+/303483 (QCalendar)
>>>   https://codereview.qt-project.org/c/qt/qtbase/+/303512 (QColor)
>>>   https://codereview.qt-project.org/c/qt/qtbase/+/303707 (arg())
>>>   https://codereview.qt-project.org/c/qt/qtbase/+/303708 (QUuid)
>> Agreed on arg(), it's a great clean-up and performance improvement.
>> But it's part of QString itself. The other ones, however, are the slippery
>> slope. I agree they improve performance for sink-only functions, but we don't
>> *need* QAnyStringView for them. For example, for QCalendar, they could be the
>> QStringView/QUtf8StringView pair.
> 
> Remember that a great deal of performance improvement already came from adding QLatin1String::arg() and QStringView::arg(). You can say this is string-classes, but it really isn't. It's formatting: you take a format string, parse it, and produce some result based on it. We have tons of these in our API: date/time come to mind. If QAnyStringView for arg() is a good idea, so it is for any format string, and by a ever-so-slight extension, for any parser input. Which brings us to:
> 
>> My problem is not with the clean up that it provides, it's adding yet another
>> class to our API.
> 
> We seems to have agreement on using QAnyStringView for "string classes". If we do, this argument is moot, as the class will _be_ there. It's then only a little extra step to my proposal, since a) using QAnyStringView more widely makes for a more consistent string story in Qt 6 and b) meeting my proposed minimal step would eradicate QLatin1String from our APIs, reducing the newcomer-need-to-know API by one class. +QAnyStringView -QLatin1String = same number of classes.
> 
> That said, I'll happily repeat my mantra that fewer classes don't make an API easier to use, it's fewer responsibilities per class that does. And my design is very clean here:
> 
> - QAnyStringView is the interface type (and only that)
> - Q(Utf8)StringView are the parse types (via QAnyStringView::visit())
> - QString is a (possible) storage type (std::u16string and QVAL<char16_t> are others,
>  and I expect a lot more QVLA<char16_t> to be used in implementation as we move forward).
> 
> One class - exactly one responsibility. _That_ makes an API easy to use.
> 
>>> Now that I hopefully have convinced you that we need QAnyStringView,
>>> where to go from here?
>>> Given the lack of time until Qt 6.0, I'd like to propose to just replace
>>> all overload sets that contain QL1S with one overload taking
>>> QAnyStringView
>> Agreed for the string classes themselves.
> 
> I hope I've made my point about not stopping there.

Let’s start where we have agreement. I think we have that on string and related low level classes, especially in all places where we don’t take ownership of the argument.

We’ll run out of time to get this done anyway :)

>>> The implementation usually contains the optimized handling of L1 data
>>> already, and can often be easily extended to UTF-8, too, cf. QColor,
>>> QUuid, arg().
>> Those are likely candidates, yes.
>> I just don't want to give blanket approval for everything. There may be places
>> where the correct solution is to delete the QLatin1String overload and keep
>> only QString.
> 
> I disagree here. If there's already a QL1S overload, we must never go back to just QString.

I don’t think this is black and white. But in any case, we have a very limited amount of API outside of Qt Core where we have overloads taking a QL1S, so this might be mostly a theoretical discussion.
> 
> What instead we should look into is whether we can make a QString&&/QAnyStringView overload set work meaningfully (ie. no ambiguities for whatever the user passes). That would allow classes that actually store QStrings to allow transfer-of-ownership, at the cost of exposing an implementation detail. The main problem I see here is QStringBuilder.

The other problem is that it won’t transfer ownership for existing code in many places, as the QString&& overload won’t get used if the argument isn’t an rvalue (and it would fall back to QAnyStringView). But as said above, this does in any case not sounds like something we can do in time for 6.0.

Cheers,
Lars