[Development] What's Q_PRIMITIVE_TYPE for?

Thu Nov 12 03:10:28 CET 2020

On Wednesday, 11 November 2020 10:14:26 PST Giuseppe D'Angelo via Development 
wrote:
> Hi,
> 
> On 11/11/2020 18:14, Thiago Macieira wrote:
> > So my recommendation is:
> >   1) deprecate Q_PRIMITIVE_TYPE and rename to Q_TRIVIAL_TYPE
> >   2)*not*  use memset-to-zero construction anywhere
> > 
> > #2 implies changing QPodArrayOps, which does use memset, to use a loop
> > calling the default constructor. Two of the four compilers do optimise
> > that into a call into memset:https://gcc.godbolt.org/z/Ks3M5h. And
> > there's nothing the ICC team likes to work on more than losing on a
> > benchmark.
> 
> The problem is that ~100% of our value classes are not trivial, because
> we always initialize our data members. So, we need type traits anyhow to
> distinguish between primitive/relocatable/complex; and I am against
> calling it "Q_TRIVIAL_TYPE" because this property has now nothing to do
> with pure triviality.

Understood, but then what's the harm of using Q_RELOCATABLE_TYPE for them? 
Asked differently: if those classes initialise the members to a non-zero 
value, why is memset with zero acceptable as a construction?

> *Some* trivial types can be initialized via memset(0), but not all of
> them, so the set of primitive types (according to our current
> definition) and the set trivial types are intersecting (*).

I propose we initialise none with memset. Don't try.

> In theory we could just rely on the optimizer to turn
> std::uninitialized_value_construct_n into a memset(0). (If you have an
> out of line constructor that does 0-bit initialization, and the compiler
> doesn't see it and do the transformation, you don't have my sympathies.)
> 
> This would, in principle, allow for unifying handling of primitive and
> relocatable types:
> 
> * Construction: use uninitialized_value_construct
>    * Primitive: the compiler figures out it's a memset()
>    * Relocatable: call the default constructor (and possibly the
> compiler figures out it's a memset())
> 
> * Copy: just use std::uninitialized_copy
>    * Primitive: the compiler figures out it's a memcpy()
>    * Relocatable: call the copy constructor (possibly the compiler
> figures out it's a memcpy())
> 
> * Move: just use std::uninitialized_copy
>    * Primitive: the compiler figures out it's a memcpy()
>    * Relocatable: call the move constructor (possibly the compiler
> figures out it's a memcpy())
> 
> * Destruction: just use std::destroy
>    * Primitive: compiler does nothing
>    * Relocatable: call the destructors (possibly do nothing if trivial)

Agreed, except for the part of using the Standard Library functions. Just loop 
around the block of memory and call the proper constructors using placement 
new or the destructor.

> But as far as I can tell, compilers do not do these transformations as
> aggressively as we'd like. So we still have a distinct advantage at
> using the trait, at least for the Qt 6 lifetime. Take for instance
> 
> QStringView:
> > https://gcc.godbolt.org/z/6Taoo4
> 
> GCC, ICC, MSVC don't optimize anything. Clang chokes on the
> (pointer,int) scenario, but only if the initialization goes through a
> constructor. Don't ask me why.

(pointer,int) isn't applicable for us any more because we don't have that 
case. The reason that Clang chokes is because of the 4-byte tail-padding in 
that structure. When you define a non-default constructor, the compiler 
decided to expand to the exact code you wrote, instead of taking the liberty 
of writing to those padding bits (which *you* can't do, but the compiler can).

I think having a trait that effectively requires "this type can be constructed 
by a memset of 0" is risky, even if we've been doing it all along. First, 
because it depends on the representation of pointers (that NULL is zero bits) 
and it's not impossible for some PMO pointer to exist in the class, unnoticed. 
It requires the developer to know the byte representation of their class.

Second and more importantly, because compilers are moving towards object life-
time tracking and we ought to visibly call a constructor and a destructor. 
This will eventually include the need to inform the compiler of the lifetime 
of the arrays themselves, but right now there's no language solution, so we 
ignore it for now.

And we file "missed optimisation" reports to GCC and Clang. (As I said before, 
you get ICC to optimise by creating a benchmark that shows it performs worse 
than the competition; MSVC hasn't figured in "highly optimised code" for 
nearly a decade now and they have a lot to catch up with first)

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering