[Development] Views in APIs

Wed May 13 11:43:14 CEST 2020

On 2020-05-13 09:04, Lars Knoll wrote:
>> On 12 May 2020, at 18:59, Marc Mutz <marc.mutz at kdab.com> wrote:
[...]
>>> Most other classes:
>>> * Only take and return QString
>> 
>> This is wrong. By taking a QString in the API of non-string-related 
>> APIs, you expose QString as an implementation detail of the class. 
>> Think QRegion, which, before I added begin()/end(), had a SSO-like 
>> internal container (1 QRect or QVector<QRect>) which forced most users 
>> to code around the owning-container API like this:
> 
> I don’t think you can compare those cases. QString is the container
> our users use in 99% of the cases to hold a string. And this won’t
> change (and I don’t think we should advocate any changes here).
> 
> So then, an API taking and returning a QString is the most logical,
> easiest and most convenient to use.

There are two independent issues here: taking and returning.

For taking I don't see how "taking QString is most logical, easiest, 
most convenient" follows. It's simply not true that 99% of the cases to 
hold a string use QString. In the common case of applications which are 
not localized, all the nice QString APIs are fed with const char* 
literals, causing QString creation and destruction in user code. Even if 
you change all these to const char16_t* and a QString that doesn't 
allocate in this case, you still have all these complex (in the 
QTypeInfo sense) QString objects created and destroyed at the call site. 
This bloats user code. If all those shiny QString APIs would take 
QStringView instead, the construction and destruction of which is 
trivial (in the C++ sense), the compiler can remove all those calls and 
the QString construction (if any) happens centrally inside the library 
function (O(1) instead of O(N), N = #callers).

In the not so hypothetical case that Qt is used to visualize results of 
some business calculations, chances are that thrid-party libraries will 
use std::string or std::u16string, and not QString, requiring the use of 
QString::fromStdString() to pass these to a QString API. Had the API 
taken QStringView, no extra code would have been necessary.

So, I have shown that taking QString is neither the most convenient, nor 
the most easy, choice, as it requires users with other string types as 
data sources to jump through hoops to pass their data. Only if you 
employ a very narrow focus where both efficiency and the existence of 
3rd-party code are all ignored, can you still maintain that a function 
taking an owning container is more convenient and easier than one taking 
a view. The only logic in such an API that _I_ can find, however, is 
"MUST ... NOT ... BREAK ... COW logic", which has been proven as flawed 
over twenty years ago:

http://www.gotw.ca/gotw/043.htm
http://www.gotw.ca/gotw/044.htm
http://www.gotw.ca/gotw/045.htm

(and CoW isn't even correctly implemented in Qt, ever since unsharable 
data was removed).

In case the internal storage _is_ QString, then providing a QString&& 
overload to avoid the copy is a good idea, if you're willing to impart 
the details of the implementation that way.

Second, returning.

You talk about QString getting an SSO buffer, maybe. Then returning a 
QString will become even more expensive. It already got more expensive, 
since instead of sizeof(void*) it's now two or three (didn't check) 
words, but adding SSO will not make it better. And if we don't get SSO, 
classes can decide to store u16string instead, which _has_ SSO already. 
So, efficiency wins here, too.

>>   if (r.rectCount() == 1) {
>>      use(r.boundingRect());
>>   } else {
>>      const auto rects = r.rects();
>>      for (const QRect &rect : rects)
>>         use(rect);
>>   }
>> 
>> If rects() had been a view (today, we'd use gsl/std::span<const 
>> QRect>), all users could just do
> 
> I don’t get that argument. The region is a list of rectangles today,
> so you could simply add a rects() method that returns them and the
> code below would work.

I think you should take a look at QRegionPrivate::begin() (in Qt 6 or Qt 
5). And in Qt 5's version of QRegion::rects().

>>   for (const QRect &rect : r.rects())
>>      use(rect);
>> 
>> This is objectively a better API, for two reasons: a) the user doesn't 
>> need to care about some weird idiosyncrasies of the class to avoid 
>> performance penalties and b) the class author is now free to extend 
>> the SSO buffer from one to, say, four, without changing the API, not 
>> even those affected by Hyrum.
>> 
>> It _seems_ your solution is to fold views into owning containers, and 
>> while that may seem to work, it's dangerous:
>> 
>> Assume QRegion::rects() returned a QVector-acting-as-view. Then this 
>> would silently fail:
>> 
>>    QRegion region();
>>    for (const QRect &rect : region().rects())
>>        use(rect);
>> 
>> because, clearly, QVector is an owning container, so we don't care 
>> that the QRegion went out of scope. Whereas with a view, it will be 
>> immediately obvious (to a tool like Clazy, at least) that this can't 
>> work.
> 
> The above example is rather weirdly constructed.
> 
> But anyhow, those data lifetime issues when using views as return
> values extensively are a lot less obvious to spot when a human reads
> the code. APIs should be safe to use wherever possible, so that our
> users don’t have to worry about those things. This will lead to fewer
> bugs in the resulting code and faster development times. Trading that
> for some saved CPU cycles is in almost all cases (and yes there are
> exceptions in low level classes) a very bad deal.

You didn't get my point: If I return a view, it's clear what's going on 
(to user and tools) and that the data will only be valid until a 
non-const member function on the source object is called. So far so 
simple. This is what we have with QString::data() and a ton of other 
APIs, and it's easy to understand.

Now change that to an owning container. Say QString, for the sake of 
argument. Now, users (and tools) expect that QString to own the data, 
but that's far from guaranteed. We had the problem with QStringLiteral 
in plugins, where the data just went away on plugin unloading. Solution? 
Don't actually unload plugins on unloading. Wow! What we do to save 
QString-as-a-view! Tell me how that's convenient and easy API? Had all 
those functions not returned QStringLiterals through QString, but 
through QStringView, it would have been more suggestive to copy the data 
than it was with QString.

And this will just become more pronounced if every construction from a 
char16_t* will create a QString that doesn't actually own the data. By 
choosing QString (owning container or view) over QStringView (view only) 
in APIs, you (deliberately, if I may say so) blurred the line between 
owning container and view, and in doing so you kill the raison d'étre 
for returning owning containers: avoiding dangling references. This is 
neither convenient nor easy. Actually, you're forcing all users to take 
a deep copy, as per

    QString deepCopy(QStringView v) { return v.toString(); }

(and hope that suggestions for QStringView carrying the d-pointer in a 
vein attempt at "optimizing" QStringView::toString() will not be 
implemented).

>> So, I'd argue for the complete opposite here: we would increase 
>> encapsulation of our APIs if they stopped trafficking in owning 
>> container types. I call this the NOI pattern (Non-Owning Interface). 
>> By not having to serve QString everywhere, we'll be much more free to 
>> use alternative storage types in the implementation (e.g. 
>> QVarLengthArray<char16_t>, or - the horror! - std::pmr::u16string). 
>> Handling-wise, QStringView makes all these choices equal, so the 
>> implementation of a class can use whatever is objectively optimal 
>> instead of being bound to QString.
> 
> You can just as well argue the other way round. Returning a view
> requires the class to store the data in continuous memory. It can’t
> create it on the fly if asked.

That's not true. You can always do the construction lazily and then 
return a view. Thread-safety is an issue, yes, but it's not terribly 
difficult to fix (and not fixed in many other classes, either, QFileInfo 
comes to mind, so it can't be that important). Compared to the 'usual' 
implementation in these cases, repeated calls to the function won't have 
to re-calculate the result anew each time (see 
https://codereview.qt-project.org/c/qt/qtbase/+/299986 for a recent 
example).

>> AsidE: If you think that CoW is still a thing today: no. SSO is a 
>> thing these days, and it seems that QString will not have it in Qt 6, 
>> either. NOI favours SSO, QString-everywhere cements the naïve CoW 
>> world of the 1990s for yet another decade.
> 
> Let’s see if we can get SSO working for QString and QBA in time. It
> should not be very difficult to implement with the new structure we’re
> having.

Even if you enable it for QString and QBA, you can't implement it for 
QVector without breaking tons of code that relies on iterator stability 
(which is why std::vector can't do it, either).

> You might call CoW naive, but I do believe that the fact that Qt does
> use containers that own their data in most of our APIs is what makes
> Qt significantly simpler and safer to use than many other APIs.

Agreed. But you're in the process of blurring the line between owning 
containers and views. This already started in Qt 5 with QStringLiteral 
and even earlier with QString::fromRawData(). At least the former was 
statically-allocated and presented a problem in only a a very limited 
circumstances and the latter is explicit. But you're making 
QString(const char16_t*) the equivalent of QString::fromRawData() now 
and so the cases where QStrings are actually views that doesn't own the 
data is going to explode.

    QLineEdit *le = ...;
    {
        std::u16string u16s = businessResult();
        le->setText(u16s.data()); // = setText(QString(const char16_t*)) 
= setText(QString::fromRawData())
    } // BOOM

No such API exists in std, so a std::vector or a std::string _always_ 
own the data. For everything else, there's string_view and span, which 
_never_ own.

> Using views in our API would make it in many cases harder to track
> lifetime, esp. if they are combined with the use of auto. Yes, tools
> like clazy can help, but I’d rather have inherent safety than rely on
> additional tools.

I said in a review the other days is that APIs are easy to use _not_ 
when the number of classes is minimized, but when the responsibilities 
of any given class are minimal and to-the-point. For a given domain 
area, that means more small classes are better than fewer large ones.

You don't want to port APIs to QStringView, because it's a ton of work, 
but you want to the benefits, so you're folding QStringView into QString 
and make the use of QString that much harder (btw: the same happened to 
QList/QVector).

That's legitimate, esp. as a step in the direction of fully 
view-enabling the library.

It's _not_ legitimate to claim that this makes the API "easier to use, 
more logical and convenient". It doesn't. It's also _not_ legitimate to 
use this as an argument against more QStringView-overloads around the 
library.

To spell it out: Just like QList-is-QVector, what you're doing to 
QString is a (hopefully stop-gap) measure to avoid rewriting all the 
APIs and classes to take and accept views at the expense of making 
QString even use harder to reason about than it already has been in Qt 
5.

>> Learn from QRegion!
>> 
>> I have spoken on many conferences (at least QtWS, Meeting C++, emBO++) 
>> on this, if anyone wants to learn more.
> 
> Not everybody agrees with your opinions, and we need to remember that
> most of our users are not necessarily people knowing the C++ standard
> inside out. And they *shouldn’t* have to be.

I fail to see what this has to do with "knowing the C++ standard inside 
out". Maybe you can enlighten me? AFAICT, all I'm doing is applying 
sound engineering principles, incl., but not limited to, "make an API 
easy to use and hard to misuse" and "minimal, _efficient_ basis of 
operations" to Qt.

Thanks,
Marc