[Development] QAnyStringView
Marc Mutz
marc.mutz at kdab.com
Tue Jun 23 11:35:05 CEST 2020
Hi,
I went to the drawing board and drew up a variant string view class.
It's here: https://codereview.qt-project.org/c/qt/qtbase/+/301594
Here's why I think we need it. At the end of the email, I also suggest
how we should go about introducing it into Qt.
Thiago and Lars are meanwhile convinced that we need a QUtf8tringView,
too. Lars sees some merit for low-level APIs, Thiago remains
unconvinced.
I have come to believe that QUtf8StringView without QAnyStringView won't
fly: Introducing QUtf8StringView without QAnyStringView will explode the
number of mixed-type operations we need to support. If we don't remove
anything, we're talking about
- QString
- QStringRef*
- QStringView
- QByteArray
- QByteArrayView
- QUtf8StringView
- QLatin1String
- char16_t
- QChar
- char8_t
- char
- QLatin1Char
- const char*
- const char16_t*
- const char8_t*
and anything I've forgotten. The best we can do to condense this down is
to revoke string-ness of QByteArray and we'd be left with
- QStringView
- QLatin1String
- QUtf8StringView
- QChar
the latter would have to accept plain char again, something we
ASCII_DEPRECATED years ago, but should be re-considered under the new
src-is-UTF-8 paradigm.
Lars would probably say that we could also drop QLatin1String, which
which I disagree[1].
Assuming for the sake of argument that we need those four types,
consider QString::replace(). Experience shows that stuff like
QStringBuilder expressions being passed will require an actual QString
overload to be present, too. Ignoring existing overloads and regexp,
we'd need 5x5=25 overloads. I won't enumerate them here. What I will
enumerate is the complete set of overloads when using QAnyStringView:
QString& QString::replace(QAnyStringView, QAnyStringView,
Qt::CaseSensitivity);
That's it.
Unlike QStringView, QAnyStringView is a pure interface type. I won't add
much in the way of parsing API to it, even though I acknowledge that's a
slippery slope. While it would be easy to add trimmed(), and tokenize()
would be really interesting, QAnyStringView should not be used for
parsing. That's what we have the three non-variant string view types
for. Being a pure interface type means we can add more "dangerous"
conversions. QStringView can't be constructed from a QStringBuilder,
e.g., because it's almost impossible to make that work without
referencing destroyed data:
QStringView s = u'c' + QString::number(x); // oops
QString c = u'c' + QString::number(x);
QStringView s = c; // ok
But QAnyStringView supports this:
str.replace(name, name % "_1");
In summary: 25 overloads is just way too much (and don't forget regex,
which adds another five).
The replace() problem is also present with relational operators and
basically wherever we have two QString arguments right now.
QAnyStringView solves this in the sense that one overload can replace
many overloads. The complexity is still there, a binary visitation of a
QAnyStringView produces nine instantiations of the visitor (though that
can be reduced to six in many cases), but many implementations fall into
one of just two classes: 1) a function would just call toString() on the
any-string-view, anyway, in which case the QString construction is taken
out of user code and centralized in the library. If you think that
doesn't matter, look at the tst_qstatemachine numbers in
https://codereview.qt-project.org/c/qt/qtbase/+/301595 (-10KiB just
from temporary QString creation and destruction)
2) the complexity is already there and QAnyStringView helps in reducing
it:
https://codereview.qt-project.org/c/qt/qtbase/+/303483 (QCalendar)
https://codereview.qt-project.org/c/qt/qtbase/+/303512 (QColor)
https://codereview.qt-project.org/c/qt/qtbase/+/303707 (arg())
https://codereview.qt-project.org/c/qt/qtbase/+/303708 (QUuid)
Another aspect that I'd like to mention is how QAnyStringView also helps
with getting rid of QLatin1String for Qt 7: Instead of having QL1S
strewn around the Qt API as we have now, we'd have just the
QAnyStringView(QLatin1String) ctor that we'd need to deprecate.
Finally, of course, QAnyStringView increases integration of Qt with
other C++ libraries, because it now transparently accepts almost any
string type that exists out there (thanks to Peppe's Magic QStringView
ctor that QUtf8tringView and QAnyStringView inherit).
I was very sceptical when some months ago someone on this ML suggested
to make QString hold either UTF8 or UTF16 data, and I still am, but in
an explicit variant string view type, this concept suddenly makes a lot
of sense.
Now that I hopefully have convinced you that we need QAnyStringView,
where to go from here?
Given the lack of time until Qt 6.0, I'd like to propose to just replace
all overload sets that contain QL1S with one overload taking
QAnyStringView
The implementation usually contains the optimized handling of L1 data
already, and can often be easily extended to UTF-8, too, cf. QColor,
QUuid, arg().
This should really happen for Qt 6, because it will greatly clean up our
lower-level APIs and tell a consistent story.
On top of that, we can also think of replacing overloads sets that
contain QString and (QStringView or QStringRef) with one overload taking
QAnyStringView, or QString functions that typically get passed constants
(like setObjectName()), but I agree with Lars that there's not enough
time and man-power to bring this to a conclusion for Qt 6.
Thanks,
Marc
[1] First, we have a lot of existing QLatin1String use in code, both in
Qt itself, as well as in code that has seen e.g. Clazy. Users of
QLatin1String know why they use this class - it's either to silence
QT_NO_CAST_FROM_ASCII or because there's a QLatin1String overload that
they call and that prevents a QString creation. Either way, these
developers will not react kindly if a recommended-in-Qt-5 solution
suddenly gets either removed or heavily pessimized in Qt 6.
Second, UTF-8 is a multi-byte encoding, like UTF-16. Unlike L1 ->
UTF-16, however, the number of code points needed to represent L1 in U8
is not constant. That means that important optimisations like
bool operator==(QLatin1String lhs, QStringView rhs) { return
lhs.size() == rhs.size() && ~~~~; }
no longer work:
bool operator==(QUtf8StringView lhs, QStringView rhs) { return
lhs.size() == rhs.size() // NOPE!
If you think this doesn't matter, think again: it's the reason why in
C++20 the original design of <=> was changed to only synthesize <, >,
<=, >= and no longer also ==, !=. If you still don't believe, look at
some if-else-chain that probably already exists somewhere (uic comes to
mind):
if (name == QLatin1String("name")) {
~~~~
} else if (name == QLatin1String("type")) {
~~~~ 50 other tokens ~~~~
} else {
// error
}
all of these start with a size check whereas
if (name == "widget") {
~~~~
} else if (name == "type") {
~~~~ 50 other tokens ~~~~
} else {
// error
}
cannot. They immediately go into the strcmp loop. Now imagine there's a
rather common prefix to all these tags...
if (name == "qt_impl_widget") {
~~~~
} else if (name == "qt_impl_type") {
~~~~ 50 other tokens ~~~~
} else {
// error
}
and you see where I'm going.
More information about the Development
mailing list