[Development] How qAsConst and qExchange lead to qNN
Marc Mutz
marc.mutz at qt.io
Wed Nov 16 09:54:52 CET 2022
Hi Philippe,
On 15.11.22 12:50, Philippe wrote:
> On Tue, 15 Nov 2022 08:52:24 +0000
> Marc Mutz via Development <development at qt-project.org> wrote:
>> QAnyStringView, QUtf8StringView and, later, QAnyString, QUtf8String,
>> can be used to to make UTF-8 a first-class citizen in Qt.
>
> Maybe, but I see a deviation from simplicity (not "Qt-ish")
Do you? Then answer me this: what is QByteArray? It's not an array of
bytes, because it has toLower() and toUpper(), which don't make sense
for binary data. It also isn't a string, because, while it's contents
are to be treated as UTF-8, QByteArray(u8"ä").toUower() != u8"Ä".
Now, tell, me: What's the Qt was to convert a UTF-8 string to lower-case?
Answer: fromUtf8().toLower().toUtf8()
How is any of this simple? An API isn't automatically simple just because it
minimizes the number of classes. A simple API is one where each class
has one responsibility, and one responsibility only. Don't confuse
familiarity with simplicity. A new user will have no problem with an API
where QByteArray is _only_ an octet-stream and QUtf8String is _only_ a
UTF-8 string.
And don't even get me started on QByteArray::fromRawData().
The simplest possible string API is one where owning and non-owning
containers are separate types, and we have different types for each
major encoding, incl. binary:
Encoding | value_type | Owning | Non-owning | Rem. |
---------+------------+---------------+--------------------+-------|
Latin-1 | char? | QLatin1String | QLatin11StringView | Qt 7 |
UTF-8 | char8_t | QUtf8String | QUtf8StringView | C++20 |
UTF-16 | char16_t | QString | QStringView | |
---------+------------+---------------+--------------------+-------+
any of ^ | --- | QAnyString | QAnyStringView | |
---------+------------+---------------+--------------------+-------+
binary | std::byte | QByteArray | QByteArrayView | |
---------+------------+---------------+--------------------+-------+
L1 is really US-ASCII, but it makes sense to not throw away the 8th bit,
and go directly to L1. And no, UTF-8 is not a full replacement for
L1/US-ASCII, because it is a variable-length encoding (size() check for
op==, also think how to implement QString::insert(UTF-8) vs.
QString::insert(L1) efficiently to see why).
We have space for one more entry in QAnyStringView::Tag, so we could
support UTF-32 or Local8Bit, too, going forward.
This is expressive, and simple API. No more head-scratching over whether
a QByteArray-taking function expects UTF-8 or binary data. No more bugs
because owning containers are sometimes not NUL-terminated, even though
they promise to always be. No more segfaults because QString backing
data was statically allocated and the library was already unloaded (can
not only happen on plugin unload, but also during shutdown, or so I'm told).
You may belittle these problems as being irrelevant in practice, but
it's a kind of problem that, if it strikes, leaves you dumbfounded. As
opposed to statically-detectable lifetime issues with non-owning
containers. I don't know about you, but I prefer *facepalm* problems
readily diagnosed in the IDE over multi-night debugging sessions.
>> But UTF-16 is sacrosanct in Qt. It's a cult. Irregardless of how many
>> deep copies it takes to convert to and from UTF-16 from native
>> encodings, people still worship it as god-given. It's not.
>
> If this is "sacrosanct", this is simply because many people appreciate
> QString for its rich API and ease of use. This has to be respected.
Note how I'm not saying to remove QString in favour of u16string. But we
have a way to abstract said rich API from the underlying storage now
(via QStringView). The idea is to provide rich API and ease of use for
UTF-8 strings, too.
>
>> There's nothing inherently Qt-ish about owning containers.
>
> Yes and no, because owning containers are part of the very "Qt-ish"
> Implicit Sharing idiom, which one is _great_ for the ease of use, safety
> and optimization it provides.
>
> ~115 Qt classes: https://doc.qt.io/qt-6/implicit-sharing.html
Regardless of how you think about implicit sharing, you must agree that
Qt has perverted that:
We kicked out the unsharable state, normally entered when handing out
mutable references:
QString c = "mouse";
auto it = c.begin();
auto copy = c;
*it = 'h';
assert(copy == "mouse"); // nope: it's house, now
If CoW is sooooo superior, why did we kick out the industry-standard
unsharable state? Because otherwise the performance sucked. Well, CoW
performance sucks. Esp. for small strings, which are the majority of
strings. Folly uses CoW only for strings > 256 chars, IIRC. But CoW in
Qt is another cult, so we rather break correctness of our APIs than back
off the cult.
It saddens me to see the project so stuck in the 90s.
Thanks,
Marc
--
Marc Mutz <marc.mutz at qt.io>
Principal Software Engineer
The Qt Company
Erich-Thilo-Str. 10 12489
Berlin, Germany
www.qt.io
Geschäftsführer: Mika Pälsi, Juha Varelius, Jouni Lintunen
Sitz der Gesellschaft: Berlin,
Registergericht: Amtsgericht Charlottenburg,
HRB 144331 B
More information about the Development
mailing list