[Development] How qAsConst and qExchange lead to qNN

Wed Nov 16 09:54:52 CET 2022

Hi Philippe,

On 15.11.22 12:50, Philippe wrote:
> On Tue, 15 Nov 2022 08:52:24 +0000
> Marc Mutz via Development <development at qt-project.org> wrote:
>> QAnyStringView, QUtf8StringView and, later, QAnyString, QUtf8String,
>> can be used to to make UTF-8 a first-class citizen in Qt.
> 
> Maybe, but I see a deviation from simplicity (not "Qt-ish")

Do you? Then answer me this: what is QByteArray? It's not an array of 
bytes, because it has toLower() and toUpper(), which don't make sense 
for binary data. It also isn't a string, because, while it's contents 
are to be treated as UTF-8, QByteArray(u8"ä").toUower() != u8"Ä".

Now, tell, me: What's the Qt was to convert a UTF-8 string to lower-case?

Answer: fromUtf8().toLower().toUtf8()

How is any of this simple? An API isn't automatically simple just because it 
minimizes the number of classes. A simple API is one where each class 
has one responsibility, and one responsibility only. Don't confuse 
familiarity with simplicity. A new user will have no problem with an API 
where QByteArray is _only_ an octet-stream and QUtf8String is _only_ a 
UTF-8 string.

And don't even get me started on QByteArray::fromRawData().

The simplest possible string API is one where owning and non-owning 
containers are separate types, and we have different types for each 
major encoding, incl. binary:

Encoding | value_type | Owning        | Non-owning         | Rem.  |
---------+------------+---------------+--------------------+-------|
Latin-1  | char?      | QLatin1String | QLatin11StringView | Qt 7  |
UTF-8    | char8_t    | QUtf8String   | QUtf8StringView    | C++20 |
UTF-16   | char16_t   | QString       | QStringView        |       |
---------+------------+---------------+--------------------+-------+
any of ^ | ---        | QAnyString    | QAnyStringView     |       |
---------+------------+---------------+--------------------+-------+
binary   | std::byte  | QByteArray    | QByteArrayView     |       |
---------+------------+---------------+--------------------+-------+

L1 is really US-ASCII, but it makes sense to not throw away the 8th bit, 
and go directly to L1. And no, UTF-8 is not a full replacement for 
L1/US-ASCII, because it is a variable-length encoding (size() check for 
op==, also think how to implement QString::insert(UTF-8) vs. 
QString::insert(L1) efficiently to see why).

We have space for one more entry in QAnyStringView::Tag, so we could 
support UTF-32 or Local8Bit, too, going forward.

This is expressive, and simple API. No more head-scratching over whether 
a QByteArray-taking function expects UTF-8 or binary data. No more bugs 
because owning containers are sometimes not NUL-terminated, even though 
they promise to always be. No more segfaults because QString backing 
data was statically allocated and the library was already unloaded (can 
not only happen on plugin unload, but also during shutdown, or so I'm told).

You may belittle these problems as being irrelevant in practice, but 
it's a kind of problem that, if it strikes, leaves you dumbfounded. As 
opposed to statically-detectable lifetime issues with non-owning 
containers. I don't know about you, but I prefer *facepalm* problems 
readily diagnosed in the IDE over multi-night debugging sessions.

>> But UTF-16 is sacrosanct in Qt. It's a cult. Irregardless of how many
>> deep copies it takes to convert to and from UTF-16 from native
>> encodings, people still worship it as god-given. It's not.
> 
> If this is "sacrosanct", this is simply because many people appreciate
> QString for its rich API and ease of use. This has to be respected.
Note how I'm not saying to remove QString in favour of u16string. But we 
have a way to abstract said rich API from the underlying storage now 
(via QStringView). The idea is to provide rich API and ease of use for 
UTF-8 strings, too.

 >
 >> There's nothing inherently Qt-ish about owning containers.
 >
 > Yes and no, because owning containers are part of the very "Qt-ish"
 > Implicit Sharing idiom, which one is _great_ for the ease of use, safety
 > and optimization it provides.
 >
 > ~115 Qt classes: https://doc.qt.io/qt-6/implicit-sharing.html
Regardless of how you think about implicit sharing, you must agree that 
Qt has perverted that:

We kicked out the unsharable state, normally entered when handing out 
mutable references:

    QString c = "mouse";
    auto it = c.begin();
    auto copy = c;
    *it = 'h';
    assert(copy == "mouse"); // nope: it's house, now

If CoW is sooooo superior, why did we kick out the industry-standard 
unsharable state? Because otherwise the performance sucked. Well, CoW 
performance sucks. Esp. for small strings, which are the majority of 
strings. Folly uses CoW only for strings > 256 chars, IIRC. But CoW in 
Qt is another cult, so we rather break correctness of our APIs than back 
off the cult.

It saddens me to see the project so stuck in the 90s.

Thanks,
Marc

-- 
Marc Mutz <marc.mutz at qt.io>
Principal Software Engineer

The Qt Company
Erich-Thilo-Str. 10 12489
Berlin, Germany
www.qt.io

Geschäftsführer: Mika Pälsi, Juha Varelius, Jouni Lintunen
Sitz der Gesellschaft: Berlin,
Registergericht: Amtsgericht Charlottenburg,
HRB 144331 B