[Development] The age-old T* foo vs. T *foo

Fri Oct 18 18:53:32 CEST 2019

Jason H (18 October 2019 01:38) asked:
> How many code parsers would this change (i.e. QtCreator or QDoc?) (if
> any)?

I would consider it a bug in any of our parsers if it cared at all about
the placement of spaces.  I am tolerably confident QDoc (based, now,
on LLVM) won't need any change for this.

The distinction here goes back to C.  Even there, it's a fussy detail.
Although the Qt coding style does indeed ask for one declaration per
line (which is generally a good practice, regardless of the present
discussion, although I'd make exceptions when variables are closely
related, such as paired x, y co-ordinates; or the year, month, day parts
of a date), the common illustration

  int *p, i;

does point to something significant about the semantics of type
declarations in C (and, thus, C++).  Consider also:

  char array[size], *pointer = array;

Even without the part after the comma, notice that the type of array is
char [size], yet you can't write

  char[size] array;

which the logic of

  char* pointer;

kinda wants.  With the part after the comma, it should be clear that the
form of a declaration in C isn't

  type name [, ...];

In fact, it's (slightly simplified)

  decl-specifiers  init-declarator [, ...];

where the name is embedded somewhere in the init-declarator, but much of
the type information can be in the init-declarator with it; the
decl-specifiers only specify a base type and the storage class.  For
example,

  void (*handler)(int);

declares handler to be a pointer to a function that takes an int and
returns void; the decl-specifiers is just void, all the rest is the
init-declarator.  Next, consider (quoting a POSIX man-page):

  void (*signal(int sig, void (*func)(int)))(int);

declares signal to be a pointer to a function that takes an int sig and
a pointer of handler's type and returns a pointer of handler's type.
Again, void is the whole decl-specifiers; all of the rest is the
init-declarator.  I don't expect most readers of code to follow the
latter (I have to think about it myself), so I would *always* write it
as

  typedef void (*signal_handler_p)(int);
  signal_handler_p (*signal)(int, signal_handler_p);

to give readers slightly more of a chance of making sense of it.  The
typedef just echoes my declaration of handler, above, with "typedef
void" now as the decl-specifiers.  In the declaration of signal itself,
the initial signal_handler_p is the decl-specifiers; and the
init-declarator is all of the rest.

The init-declarator is a potentially complex text, that'll contain the
name, and the * or & of a declaration of a pointer or reference is part
of it, not part of the decl-specifiers.

Putting the * or & to the left of the space promotes the lazy reading of
a declaration as being

  type name;

which works just fine for the most common cases, but lulls readers into
using a mental model that will leave them entirely unprepared for the
subtleties that arise when anything trickier happens.

In X there's a callback type that returns void; I forget its name, but
naturally the X libraries typedef that callback type.  Long ago, I
cleaned up a mess that arose from some naive programmer defining
callbacks that returned that callback-type (rather than void) and took
the right argument lists.  Of course, he had to cast his callbacks to
the callback type when he tried to use them (because that wasn't their
actual type, but was what the API wanted).  Which meant his code was
littered with the callback type's name, where code that did the job
properly never mentioned the callback type's name *at all*.  Which meant
that other naive programmers, searching for something to copy and paste,
found his code and propagated his error at great length.  (When I
cleaned this all up, I took care to leave a comment next to each
callback's definition that named the callback type involved, so that
someone searching for that name would find correct code to copy.)  While
my fix for all of that was grinding its way through review and
integration, more instances of the error were introduced by naive
colleagues copying the same sources; I had to do a second round of
clean-up.

All of that arose because many programmers do not understand that type
declarations in C (hence also in C++) are more complicated than

  type name;

even though the callback parameter passed to the X library looked like
that in the library function's signature.

The (frankly minor) Good Thing about putting the * or & adjacent to the
name in a pointer or reference declaration is that it's a little
background reminder to everyone that some of the type info appears
syntactically bound to the name, not to the *base* type on the left.
This gives them just a tiny-bit better chance of coping better when they
meet less trivial types.  It's a tiny difference, but it's a nudge in
the right direction.

And, in any case, it is senseless to change this in a whole code-base.
We do have instances of the space after & or * and mostly we leave them
alone, because it would be senseless to change them, unless we happen to
be either changing the relevant line anyway or doing a general clean-up.
Despite preferring the space before the * or &, if I walked into a
project that did it the other way, I'd just grin and bear it: it makes
no sense to change, once there's a significant body of code in a
consistent style - consistency is more important than this, either way
round.

	Eddy.