[Development] Use of Standard Library containers in Qt source code

Mon Jul 4 14:07:19 CEST 2016

Stephen Kelly gave QQuickListViewPrivate::applyInsertionChange as an
example of "raw" for-loops being tricky to follow:

>>>> There's a `++index` somewhere in the middle 3 scopes deep. Further
>>>> up in the function `index` is assigned, but then assigned again
>>>> (conditionally) from the `count` of a container. Then `index` is
>>>> used later in the function, but I can't reason about what it is
>>>> when it is used.
>>>>
>>>> There are many other problems with that function, which are caused
>>>> by many 'raw' loops nested up to four levels, affecting variables
>>>> assigned outside top-level if() conditions half a page
>>>> up. Extracting such code into named functions would indicate the
>>>> intent of the function.

I've fixed issues in qmake [0] where for-loops caused similar problems.
I *was* able to reason about them, but I made mistakes.  Fortunately, we
have some good disciplines like auto-testing and code-review, that
helped me notice them.

[0] e.g. https://codereview.qt-project.org/#/c/161133

Benjamin TERRIER opined:

>>> I'd boldly reply to that that if one is unwilling to reason about a
>>> piece of code, one should not be fixing it.

and yet, when I am tracking an elusive bug to its lair, I have to make
sense of all the pieces of code that I stumble upon in the course of my
hunt.  I may not be competent to fix it, but I have to be able to make
sense of it to determine *whether* it's the cause of my bug; and, even
if it isn't, I have to make enough sense of it to determine *how* its
eccentricities interact with the rest of the tangle of moving parts that
might be implicated in my bug.  I don't mind having to go to someone
else to get it fixed half as much as I do the problem of having to make
sense of whether it's to blame, when I'm not getting any response to my
e-mail to the author for help determining what part it plays in the bug
I'm chasing.  Each bug I hunt obliges me to make sense of way more
pieces of code than the roughly one that I actually end up having to
fix.

>>> But I'd also agree that the code should be as readable as possible
>>> to get a many developers to contribute.

+1.  Also: to waste as little of my time as possible.

>>> Back to Qt current situation I think that the lack of comments in
>>> some parts is a bigger throwback than the code style.

+1.  Sometimes code is solving a complex problem; then, it likely can't
help but *be* complex.  A gentle dusting of comments (and, in extreme
cases, a big fat overview-and-theory comment somewhere easy to find in
the right file) can go a long way towards helping the visiting
bug-hunter to make sense of the thicket they're stumbling through.

Stephen Kelly gave an example of some code I would complain about in
review, for quite other reasons than those he intended, and then said:

>> It is not possible to reason about code like that.

I attest, for the record, that I can reason about code like that.  Then
again, I came to C++ via a long history of C and some pre-history I'd
rather not talk about.  I know it can get tricky, and I'm glad you
didn't stoop to goto in your example.

>> I have the feeling that faced with code like that, people just don't
>> attempt to reason about it while at the same time trying to get on
>> with what they are trying to do.

There's always some code that's harder than others to make sense of.
Everything should be made as simple as possible - but no simpler.  When
solving a complex problem, the solution is apt to be likewise complex;
as long as its complexity is truly implied by the problem being solved,
this has to be OK.

>> Here's a recent example of a contributor responding exactly that way
>> to such code:

>> https://codereview.qt-project.org/#/c/161133

I fail to see a contributor responding the way you describe: you fixed
the problem - thank you - and you clearly reasoned about the code to do
so.  It ain't pretty, but you found something that worked.

>> The way to deal with functions like that is to extract loops and
>> other concepts into functions with names.

Breaking up long functions into conceptually compact parts is indeed a
recommended practice - and has been for decades.  This is uncontentious.
(Standard practice, in contrast, is some place else entirely - which is
why I do not regard software as an engineering discipline, yet.)

>> Delegating things like that has the effect of making the function
>> shorter. Adding comments instead would not have a positive effect on
>> the size of the function and good names are less likely to bitrot
>> than comments.

Sometimes a function mutates enough to make its name misguided - but
even so, that is more apt to be picked up in code review than comments
going stale.

>> In the above example you would start by extracting a qMaxValue
>> function.

... or just by calling some suitable .max() method on the container, if
it was sensibly defined.  This is a case of a standard pattern that any
API designer should take into account and that client code shouldn't
implement "by hand", as in your example.  The nice thing about the
standard algorithms is that they free container classes from the need to
implement endlessly many methods, such as max; an algorithm can use
standard iteration to implement a generic max.  Someone who needs
max-and-min can do likewise, returning a pair.  Someone who needs to
also know the key (or index) at which max (or min) happened can have a
generic solution using pairs in place of the simple values.  There's a
long long tail of less-often-wanted generics that traditional OO would
have needed to be methods of every container class, with a painful (and
likely jagged) cut-off as different containers vary how far down that
tail they reach.  Generic algorithms get rid of those methods and save
the API designer from needing to decide how far down that tail to ride.

The tricky stuff comes when you need to do something eccentric, like
finding the max but treating -1 or 0 specially (for whatever reason), or
doing something else entirely if the object whose attribute we're
maximising meets some condition on its other members.  For good reasons,
those typically don't fit nicely into the standard patterns; and trying
to shoe-horn them into those patterns tends to implicate traversing the
data-structure more times than you will if you use raw loops (smartly).

>> You could implement that function in terms of std::max_element, or
>> you could write a loop in the function if that's more
>> comfortable. Even in the latter case, it is not a 'raw loop'. You
>> have named a concept and using it will make other code more readable.

Where that is possible, yes - by all means - do that.  Indeed, one good
argument for using the standard patterns is so that raw for loops go
away wherever the code is "essentially straight-forward".  Then any
surviving for loop becomes a clear alert to the presence of something
out of the ordinary - such as essential complexity that we can't get
away from.  Coders still need to learn how to reason about such loops.
Be glad you don't have to cope with the spaghetti that flows from
liberal use of goto.

>> I'm not going to recommend replacing all loops with some Qt version
>> of range adaptors starting tomorrow.
>>
>> There are many small steps which have benefits for readability.

There are many ways we can improve our code-base.  Where changing to use
implementations of standard patterns is one of them, by all means do it.

Jaroslaw Staniek contributed:

> 1. No doubt the algorithms and such improve our programming experience
> and the simplifies activity that costs us most: reading/understanding
> the code and intents.

I agree with your focus: *where* standard implementations of common
patterns *do* help readers make sense of code, they (like anything else
that serves this goal) are welcome.

> 4. I perceive *consistency in naming* as a higher value [...] A
> building block of a project is its internal coding guideline.

It's also worth noting that differences between projects can actually be
helpful: when I'm looking at code in one project (e.g. Qt) that calls
code from another (e.g. OpenSSL), the difference in naming style helps
by alerting me to which bits of the code are "alien" and can't be relied
on to follow the *other* guidelines (behaviour, not just naming) of our
project.  This is an argument against adding STL-style aliases for
methods of our containers *unless* the container really does behave the
way STL containers do - i.e. when someone used to the STL forms
expectations based on STL-normality, due to seeing STL-style method
names on a container, that container had better actually behave fully
like an STL container, or they'll get surprised.

> Engineers are good at switching context once a while, but I am not
> sure they are good at switching it every few seconds.

Context-switches are expensive.  If I am obliged to make many of them
(as is not uncommon when keeping up with the endless stream of code
reviews) I get tired fast.  Fortunately I can take a nap if I have to.

To the extent that Qt Does Things Differently than STL, it is actually a
feature that the two camps name things differently - that cues the
reader in to which idioms to expect will work, based on which naming
scheme is in use.  To the extent Qt Works The Same Way As STL, it's good
to mirror STL naming in the Qt API.

	Eddy.