[Development] Views in APIs (was: Re: QString and related changes for Qt 6)

Wed May 13 09:04:38 CEST 2020

> On 12 May 2020, at 18:59, Marc Mutz <marc.mutz at kdab.com> wrote:
> 
> Hi Lars,
> 
> Let me pick important points one per email here:
> 
> On 2020-05-12 09:49, Lars Knoll wrote:
> [...]
>> For String related classes:
>> * All methods not taking ownership of the passed arguments take a
>> QStringView
> 
> Almost (see below).
> 
>> * If the method stores a pointer to the passed data it should take a
>> QString to not surprise users.
> 
> ... in addition ... (though see below).

I don’t see a need to take a QStringView in those cases. With the exception of some low level APIs, Qt’s APIs should be safe to use. An overload taking a QStringView would either need to copy the data (in which case we could just as well only offer the QString version), or it holds an ‘unsecured’ pointer to the data, and lifetime is out of control of the class.

The second option is something I’m only willing to accept in a very limited number of cases, and the API naming should make it clear that this is what’s happening.

>> Exceptions can be done where it makes
>> sense, but then the method naming has to give clear indications that
>> this happens (like e.g. fromRawData())
>> * Return a QString in QString itself and when doing conversions,
>> return QStringView from QStringView
>> * No QStringRef!
>> * QLatin1String for backwards compatibility, can be disabled with a
>> macro (similar to QT_NO_CAST_FROM_ASCII)
>> * Remove or deprecate overloads taking a (const Char *, length) pair.
>> Replace them with QStringView
> 
> +1 to the above
> 
>> Most other classes:
>> * Only take and return QString
> 
> This is wrong. By taking a QString in the API of non-string-related APIs, you expose QString as an implementation detail of the class. Think QRegion, which, before I added begin()/end(), had a SSO-like internal container (1 QRect or QVector<QRect>) which forced most users to code around the owning-container API like this:

I don’t think you can compare those cases. QString is the container our users use in 99% of the cases to hold a string. And this won’t change (and I don’t think we should advocate any changes here).

So then, an API taking and returning a QString is the most logical, easiest and most convenient to use. 

> 
>   if (r.rectCount() == 1) {
>      use(r.boundingRect());
>   } else {
>      const auto rects = r.rects();
>      for (const QRect &rect : rects)
>         use(rect);
>   }
> 
> If rects() had been a view (today, we'd use gsl/std::span<const QRect>), all users could just do

I don’t get that argument. The region is a list of rectangles today, so you could simply add a rects() method that returns them and the code below would work.
> 
>   for (const QRect &rect : r.rects())
>      use(rect);
> 
> This is objectively a better API, for two reasons: a) the user doesn't need to care about some weird idiosyncrasies of the class to avoid performance penalties and b) the class author is now free to extend the SSO buffer from one to, say, four, without changing the API, not even those affected by Hyrum.
> 
> It _seems_ your solution is to fold views into owning containers, and while that may seem to work, it's dangerous:
> 
> Assume QRegion::rects() returned a QVector-acting-as-view. Then this would silently fail:
> 
>    QRegion region();
>    for (const QRect &rect : region().rects())
>        use(rect);
> 
> because, clearly, QVector is an owning container, so we don't care that the QRegion went out of scope. Whereas with a view, it will be immediately obvious (to a tool like Clazy, at least) that this can't work.

The above example is rather weirdly constructed. 

But anyhow, those data lifetime issues when using views as return values extensively are a lot less obvious to spot when a human reads the code. APIs should be safe to use wherever possible, so that our users don’t have to worry about those things. This will lead to fewer bugs in the resulting code and faster development times. Trading that for some saved CPU cycles is in almost all cases (and yes there are exceptions in low level classes) a very bad deal.
> 
> So, I'd argue for the complete opposite here: we would increase encapsulation of our APIs if they stopped trafficking in owning container types. I call this the NOI pattern (Non-Owning Interface). By not having to serve QString everywhere, we'll be much more free to use alternative storage types in the implementation (e.g. QVarLengthArray<char16_t>, or - the horror! - std::pmr::u16string). Handling-wise, QStringView makes all these choices equal, so the implementation of a class can use whatever is objectively optimal instead of being bound to QString.

You can just as well argue the other way round. Returning a view requires the class to store the data in continuous memory. It can’t create it on the fly if asked.
> 
> AsidE: If you think that CoW is still a thing today: no. SSO is a thing these days, and it seems that QString will not have it in Qt 6, either. NOI favours SSO, QString-everywhere cements the naïve CoW world of the 1990s for yet another decade.

Let’s see if we can get SSO working for QString and QBA in time. It should not be very difficult to implement with the new structure we’re having.

You might call CoW naive, but I do believe that the fact that Qt does use containers that own their data in most of our APIs is what makes Qt significantly simpler and safer to use than many other APIs.

Using views in our API would make it in many cases harder to track lifetime, esp. if they are combined with the use of auto. Yes, tools like clazy can help, but I’d rather have inherent safety than rely on additional tools.

> 
> Learn from QRegion!
> 
> I have spoken on many conferences (at least QtWS, Meeting C++, emBO++) on this, if anyone wants to learn more.

Not everybody agrees with your opinions, and we need to remember that most of our users are not necessarily people knowing the C++ standard inside out. And they *shouldn’t* have to be.

Cheers,
Lars