[Development] Views in APIs
Marc Mutz
marc.mutz at kdab.com
Wed May 13 11:43:14 CEST 2020
On 2020-05-13 09:04, Lars Knoll wrote:
>> On 12 May 2020, at 18:59, Marc Mutz <marc.mutz at kdab.com> wrote:
[...]
>>> Most other classes:
>>> * Only take and return QString
>>
>> This is wrong. By taking a QString in the API of non-string-related
>> APIs, you expose QString as an implementation detail of the class.
>> Think QRegion, which, before I added begin()/end(), had a SSO-like
>> internal container (1 QRect or QVector<QRect>) which forced most users
>> to code around the owning-container API like this:
>
> I don’t think you can compare those cases. QString is the container
> our users use in 99% of the cases to hold a string. And this won’t
> change (and I don’t think we should advocate any changes here).
>
> So then, an API taking and returning a QString is the most logical,
> easiest and most convenient to use.
There are two independent issues here: taking and returning.
For taking I don't see how "taking QString is most logical, easiest,
most convenient" follows. It's simply not true that 99% of the cases to
hold a string use QString. In the common case of applications which are
not localized, all the nice QString APIs are fed with const char*
literals, causing QString creation and destruction in user code. Even if
you change all these to const char16_t* and a QString that doesn't
allocate in this case, you still have all these complex (in the
QTypeInfo sense) QString objects created and destroyed at the call site.
This bloats user code. If all those shiny QString APIs would take
QStringView instead, the construction and destruction of which is
trivial (in the C++ sense), the compiler can remove all those calls and
the QString construction (if any) happens centrally inside the library
function (O(1) instead of O(N), N = #callers).
In the not so hypothetical case that Qt is used to visualize results of
some business calculations, chances are that thrid-party libraries will
use std::string or std::u16string, and not QString, requiring the use of
QString::fromStdString() to pass these to a QString API. Had the API
taken QStringView, no extra code would have been necessary.
So, I have shown that taking QString is neither the most convenient, nor
the most easy, choice, as it requires users with other string types as
data sources to jump through hoops to pass their data. Only if you
employ a very narrow focus where both efficiency and the existence of
3rd-party code are all ignored, can you still maintain that a function
taking an owning container is more convenient and easier than one taking
a view. The only logic in such an API that _I_ can find, however, is
"MUST ... NOT ... BREAK ... COW logic", which has been proven as flawed
over twenty years ago:
http://www.gotw.ca/gotw/043.htm
http://www.gotw.ca/gotw/044.htm
http://www.gotw.ca/gotw/045.htm
(and CoW isn't even correctly implemented in Qt, ever since unsharable
data was removed).
In case the internal storage _is_ QString, then providing a QString&&
overload to avoid the copy is a good idea, if you're willing to impart
the details of the implementation that way.
Second, returning.
You talk about QString getting an SSO buffer, maybe. Then returning a
QString will become even more expensive. It already got more expensive,
since instead of sizeof(void*) it's now two or three (didn't check)
words, but adding SSO will not make it better. And if we don't get SSO,
classes can decide to store u16string instead, which _has_ SSO already.
So, efficiency wins here, too.
>> if (r.rectCount() == 1) {
>> use(r.boundingRect());
>> } else {
>> const auto rects = r.rects();
>> for (const QRect &rect : rects)
>> use(rect);
>> }
>>
>> If rects() had been a view (today, we'd use gsl/std::span<const
>> QRect>), all users could just do
>
> I don’t get that argument. The region is a list of rectangles today,
> so you could simply add a rects() method that returns them and the
> code below would work.
I think you should take a look at QRegionPrivate::begin() (in Qt 6 or Qt
5). And in Qt 5's version of QRegion::rects().
>> for (const QRect &rect : r.rects())
>> use(rect);
>>
>> This is objectively a better API, for two reasons: a) the user doesn't
>> need to care about some weird idiosyncrasies of the class to avoid
>> performance penalties and b) the class author is now free to extend
>> the SSO buffer from one to, say, four, without changing the API, not
>> even those affected by Hyrum.
>>
>> It _seems_ your solution is to fold views into owning containers, and
>> while that may seem to work, it's dangerous:
>>
>> Assume QRegion::rects() returned a QVector-acting-as-view. Then this
>> would silently fail:
>>
>> QRegion region();
>> for (const QRect &rect : region().rects())
>> use(rect);
>>
>> because, clearly, QVector is an owning container, so we don't care
>> that the QRegion went out of scope. Whereas with a view, it will be
>> immediately obvious (to a tool like Clazy, at least) that this can't
>> work.
>
> The above example is rather weirdly constructed.
>
> But anyhow, those data lifetime issues when using views as return
> values extensively are a lot less obvious to spot when a human reads
> the code. APIs should be safe to use wherever possible, so that our
> users don’t have to worry about those things. This will lead to fewer
> bugs in the resulting code and faster development times. Trading that
> for some saved CPU cycles is in almost all cases (and yes there are
> exceptions in low level classes) a very bad deal.
You didn't get my point: If I return a view, it's clear what's going on
(to user and tools) and that the data will only be valid until a
non-const member function on the source object is called. So far so
simple. This is what we have with QString::data() and a ton of other
APIs, and it's easy to understand.
Now change that to an owning container. Say QString, for the sake of
argument. Now, users (and tools) expect that QString to own the data,
but that's far from guaranteed. We had the problem with QStringLiteral
in plugins, where the data just went away on plugin unloading. Solution?
Don't actually unload plugins on unloading. Wow! What we do to save
QString-as-a-view! Tell me how that's convenient and easy API? Had all
those functions not returned QStringLiterals through QString, but
through QStringView, it would have been more suggestive to copy the data
than it was with QString.
And this will just become more pronounced if every construction from a
char16_t* will create a QString that doesn't actually own the data. By
choosing QString (owning container or view) over QStringView (view only)
in APIs, you (deliberately, if I may say so) blurred the line between
owning container and view, and in doing so you kill the raison d'étre
for returning owning containers: avoiding dangling references. This is
neither convenient nor easy. Actually, you're forcing all users to take
a deep copy, as per
QString deepCopy(QStringView v) { return v.toString(); }
(and hope that suggestions for QStringView carrying the d-pointer in a
vein attempt at "optimizing" QStringView::toString() will not be
implemented).
>> So, I'd argue for the complete opposite here: we would increase
>> encapsulation of our APIs if they stopped trafficking in owning
>> container types. I call this the NOI pattern (Non-Owning Interface).
>> By not having to serve QString everywhere, we'll be much more free to
>> use alternative storage types in the implementation (e.g.
>> QVarLengthArray<char16_t>, or - the horror! - std::pmr::u16string).
>> Handling-wise, QStringView makes all these choices equal, so the
>> implementation of a class can use whatever is objectively optimal
>> instead of being bound to QString.
>
> You can just as well argue the other way round. Returning a view
> requires the class to store the data in continuous memory. It can’t
> create it on the fly if asked.
That's not true. You can always do the construction lazily and then
return a view. Thread-safety is an issue, yes, but it's not terribly
difficult to fix (and not fixed in many other classes, either, QFileInfo
comes to mind, so it can't be that important). Compared to the 'usual'
implementation in these cases, repeated calls to the function won't have
to re-calculate the result anew each time (see
https://codereview.qt-project.org/c/qt/qtbase/+/299986 for a recent
example).
>> AsidE: If you think that CoW is still a thing today: no. SSO is a
>> thing these days, and it seems that QString will not have it in Qt 6,
>> either. NOI favours SSO, QString-everywhere cements the naïve CoW
>> world of the 1990s for yet another decade.
>
> Let’s see if we can get SSO working for QString and QBA in time. It
> should not be very difficult to implement with the new structure we’re
> having.
Even if you enable it for QString and QBA, you can't implement it for
QVector without breaking tons of code that relies on iterator stability
(which is why std::vector can't do it, either).
> You might call CoW naive, but I do believe that the fact that Qt does
> use containers that own their data in most of our APIs is what makes
> Qt significantly simpler and safer to use than many other APIs.
Agreed. But you're in the process of blurring the line between owning
containers and views. This already started in Qt 5 with QStringLiteral
and even earlier with QString::fromRawData(). At least the former was
statically-allocated and presented a problem in only a a very limited
circumstances and the latter is explicit. But you're making
QString(const char16_t*) the equivalent of QString::fromRawData() now
and so the cases where QStrings are actually views that doesn't own the
data is going to explode.
QLineEdit *le = ...;
{
std::u16string u16s = businessResult();
le->setText(u16s.data()); // = setText(QString(const char16_t*))
= setText(QString::fromRawData())
} // BOOM
No such API exists in std, so a std::vector or a std::string _always_
own the data. For everything else, there's string_view and span, which
_never_ own.
> Using views in our API would make it in many cases harder to track
> lifetime, esp. if they are combined with the use of auto. Yes, tools
> like clazy can help, but I’d rather have inherent safety than rely on
> additional tools.
I said in a review the other days is that APIs are easy to use _not_
when the number of classes is minimized, but when the responsibilities
of any given class are minimal and to-the-point. For a given domain
area, that means more small classes are better than fewer large ones.
You don't want to port APIs to QStringView, because it's a ton of work,
but you want to the benefits, so you're folding QStringView into QString
and make the use of QString that much harder (btw: the same happened to
QList/QVector).
That's legitimate, esp. as a step in the direction of fully
view-enabling the library.
It's _not_ legitimate to claim that this makes the API "easier to use,
more logical and convenient". It doesn't. It's also _not_ legitimate to
use this as an argument against more QStringView-overloads around the
library.
To spell it out: Just like QList-is-QVector, what you're doing to
QString is a (hopefully stop-gap) measure to avoid rewriting all the
APIs and classes to take and accept views at the expense of making
QString even use harder to reason about than it already has been in Qt
5.
>> Learn from QRegion!
>>
>> I have spoken on many conferences (at least QtWS, Meeting C++, emBO++)
>> on this, if anyone wants to learn more.
>
> Not everybody agrees with your opinions, and we need to remember that
> most of our users are not necessarily people knowing the C++ standard
> inside out. And they *shouldn’t* have to be.
I fail to see what this has to do with "knowing the C++ standard inside
out". Maybe you can enlighten me? AFAICT, all I'm doing is applying
sound engineering principles, incl., but not limited to, "make an API
easy to use and hard to misuse" and "minimal, _efficient_ basis of
operations" to Qt.
Thanks,
Marc
More information about the Development
mailing list