[Development] C++20 comparisons @ Qt (was: Re: C++20 @ Qt)

Wed Jun 14 10:52:49 CEST 2023

On 03.11.22 10:40, Marc Mutz via Development wrote:
> TL;DR: provide named functions for the minimal set of op== and op<=>
> we'd have to write in C++20, then use macros to turn these into op==,
> op!=, op<, op>, op<=, op>= for C++17 builds and op== and op<=> for C++20
> builds.

This part somehow became lost and kinda made the whole feature miss the 
6.6 train, so let me expand on it a bit, and collect open questions 
(marked with → below):

We want the implementation of these relational operators be as close as 
possible to the way C++20 will work: There, you implement op== and op<=> 
and the compiler synthesizes everything else (cf. parent message for 
details). So we need one function E backing op== and, for ordered types, 
another function O backing op<=>.

== Why E _and_ O? ==

The reason we need E _and_ O for ordered types instead of just O is that 
O needs to order the lhs w.r.t. the rhs, which generally involves 
looking at all the state of the objects whereas E just needs to find 
_one_ difference to be able to quickly return false. E.g., a container 
can do

     bool E(~~~ lhs, ~~~ rhs) {
         if (lhs.size() != rhs.size()) return false; // O cannot do this!
         ~~~~
     }

This is a very important optimization (and one reason why L1 string 
literals are going to stay and not be replaced with UTF-8 ones: L1 op 
UTF-16 _can_ use the size() short-cut while UTF-8 op UTF-16 cannot).

== Can E be spelled op==? ==

It could. However, op== cannot have extra arguments and we have at least 
one use-case where E might have an additional argument: case-insensitive 
string comparisons. We currently have no (public) API to express op==, 
but case-insensitively. The closest we have is QString::compare(), but 
that is an O, so it cannot use the size() mismatch shortcut, assuming 
it's valid for case-insensitive string comparison (it is for L1, dunno 
about UTF-16).

So there's a certain appeal in using E as a way to consistently spell 
"op== with extra arguments". We could, e.g. do E(lhs, rhs, 
Qt::FuzzyComparison) for anything involving FP.

There's also the situation where (cheap!) implicit conversions allow one 
E to backfill several different op== (which, being hidden friends, don't 
participate in implicit conversions themselves). If E is spelled op==, 
what would the macros use to implement op==? It would be up to the class 
author to supply sufficiently-many op== in the correct form. We don't 
want that. We want all op== to be generated from the macros so we have 
central control over their generation (providing reversed operators in 
C++17, getting the signatures right, noexcept-ness, hidden-friend-ness, 
...).

So I would require new information to accept anything than a "no" here:

→ Semi-Open question 1:
Should E be spelled op==?

My take is no.

→ Open question 2:
Should E exist for all types, as a "concept name" for "op== with 
parameters", or should we leave that, as it is now, on type-by-type 
basis to each class author individually?

My take is that yes, it should exist as a "concept name". I added 
qStringsEqual() in 5.10 as public API when adding QStringView, but it 
was relegated to private API before the release. When class authors are 
given the option to accept parameters for equality comparison, they will 
find novel ways to use it (like Qt::Fuzzy).

== O as public API ==

In C++20, O would be spelled op<=>, but this is not possible in C++17, 
so O needs to be an actual named function, not an operator. The 
question, and it's a rhetorical one, then becomes: should O be public 
API, and the (rhetorical) answer is "yes, because C++17 users will want 
to be able to access O in C++17 projects, too (in C++20, they could call 
op<=>)". This includes Qt's own implementation of O's.

The gold standard for named operators in C++ is to have the 
implementation as a hidden friend (think swap()), possibly a member 
function (lhs.swap(rhs)), a (namespaced) fallback version for built-in 
types (std::swap(int, int)) and a calling convention that takes this 
into account:

    lhs.swap(rhs);
    // or
    std::swap(lhs, rhs); // if you know decltype(lhs), ...

    using std::swap;
    swap(lhs, rhs);
    // or
    std::ranges::swap(lhs, rhs); // ... if you don't.

Applied to our situation, this means:

- each (orderable) class supplies O as a hidden friend
- may supply it as a (non-static) member function
- Qt provides an implementation for built-in types in a namespace (we 
only have 'Qt').

And we have the calling convention:

    lhs.O(rhs);
    Qt::O(lhs, rhs); // if you know the type

    using Qt::O;
    O(lhs, rhs);
    O_as_CPO(lhs, rhs); // if you don't

O_as_CPO is to O what std::ranges::swap is to swapping: an ADL-enabled 
version that allows to skip the using Qt::O/using std::swap;

The same would be applicable to E if Open Question 1 ends up resolved as 
no and Open Question 2 as "yes, named concept".

This leads to the finding that O (and, depending on the outcome of the 
first two Open Questions) E cannot be named like static binary functions 
in existing classes, as, even if they are semantically compatible, 
they're syntactically incompatible.

That objectively rules out "compare" for O and "equals" for E.

== Naming O ==

Given that compare() doesn't work, what are alternatives? In the patch 
series so far, we've been using order(), as the operation orders the two 
arguments (and returns an ordering type). An alternative that was 
discussed was ordering(), which I'd be ok with, and qCompare(), which I 
would reserve as a CPO (ie. as O_as_CPO), if we don't use qOrder or 
qOrdering for that.

→ Open Question 3:
What should we call O?

My take: After contemplating this for the last two weeks: cmp() (1st), 
order() (2nd) or ordering() (3rd). I really think it pays for these 
function names to be _short_. Size _does_ matter, having very common 
functions be one word and therefore visually distinct from 
multiWordIdentifiers helps readability. Bjarne is right: people ask for 
verbose syntax for new features, so they stand out, then the whole 
ecosystem suffers from ugliness for decades to come.

== Naming E ==

So far, we've been using equal(). equals() doesn't work for technical 
reasons, but while it'd work as a member function lhs.equals(rhs), it's 
also kinda wrong if the function is taking two arguments (equals(lhs, 
rhs), but there are _two_ objects). So equal() as the plural form or 
equals() makes sense. There were also proposals of qEqual() (which I'd 
again reserve as the CPO name of E) and areEqual() (which I find uglily 
long).

→ Open Question 4:
What, if anything, should be call E?

My take: eq() (1st), equal() (2nd). I _really_ don't like areEqual() for 
the reasons given.

order() and equal() have the nice property of being equally long. With 
equal() typically returning bool and order() often returning auto, this 
makes for a very pleasing alignment of the two functions. But after 
thinking about this long and hard, I think eq() and cmp() are best, esp. 
if you consider that they stand for == and <=> :)

Thanks,
Marc

-- 
Marc Mutz <marc.mutz at qt.io>
Principal Software Engineer

The Qt Company
Erich-Thilo-Str. 10 12489
Berlin, Germany
www.qt.io

Geschäftsführer: Mika Pälsi, Juha Varelius, Jouni Lintunen
Sitz der Gesellschaft: Berlin,
Registergericht: Amtsgericht Charlottenburg,
HRB 144331 B