[Development] [SPAM] How bad QList really is

Tue Apr 28 09:56:35 CEST 2020

On 4/25/20 4:49 PM, André Pönitz wrote:
> Spam detection software, running on the system "mx.qt-project.org",
> has identified this incoming email as possible spam.  The original

[*SIGHS*]

> We all know the story that began with
> 
>     "We knew for a long time that QList is not a good default
>     container, despite what the documentation claims. The problem
>     boils down to the fact that for a lot of types T, QList<T> is
>     needlessly inefficient by allocating elements on the heap and
>     storing pointers to them instead of storing the elements
>     in-place, like e.g. QVector does. Sometimes, for large and
>     complex types, that might be exactly what you want, but a
>     conservative estimate would put that chance at less than 5%." [1]
> 
> 
> I was curious how this "conservative" estimate of "less than 5%" 
> manifests in real QList object instances in a real world example.  

> My random picks of real world examples usually end up with "Loading
> the Qt Creator project in Qt Creator itself", and "statistics" is
> something I make up myself, in this case it is looking at QList objects
> at destruction time, and sorting them a bit according to size of
> the items in the list, number of items still present at that time etc.
> 
> Besides being simple to do it's rather close to the typical use I see,
> where containers are filled up, accessed/used a few times and destroyed
> without much intermediate size changes.
> 
> 
> Here is what I get:
> 
> * total QList objects created          :    51.631.363   
>   - destructors that only bump ref     :    38.015.673      73.6%
>   - destructors that actually dealloc  :    13.615.690      26.4% 
> 
> 
> The  13.615.690  instances with actual deallocations ordered by the 
> size of the item type:
> 
>     size     occurences
> 
>      4         1.656        #  0.01 %   with internal padding
>                            
>      8     13.424.228       # 98.59 %   objects with ideal size, 
>                                     
> 
>      12             3    \
>      16        74.979     |
>      24        60.560     |
>      28           126     |
>      32        22.358     |
>      40         2.054     |
>      48        18.786     |
>      56            71     >  # 1.40 %  with indirection
>      64             3     |
>      72            46     |
>      80         3.484     |
>      96             1     |     
>      112        7.264     |
>      128           52     |     
>      184            2     |
>      2112          17    /
> 
> 
> From the 8-byte objects we have
>  
>         13.420.883  stored directly      #  99.975 %  of cases
>              3.345  stored indirectly    #   0.025 %
> 
> i.e. in 98.57% of all cases, QList objects behave optimally.

Before we enter confirmation bias here: could you also offer a breakdown 
of the actual types involved? Basically, there's some important things 
to discuss here:

1) First and foremost, in Qt 6 QString, QByteArray, QVector, are bigger 
than a pointer (3 times a pointer size). So, in Qt 6:

* either QList stays unchanged, and now we heap allocate each element 
for those cases too (thus it's necessary to know how the above 
statistics change); or

* QList gets adapted so that its internal array allocates 3 * 
sizeof(void*) per element, so that e.g. Q6StringList won't require a 
per-item allocation. This would then also avoid allocations for other 
datatypes, e.g. QStringView, QImage, maybe QVariant and QColor; but of 
course waste a ton of space for the ones which remain small (most of Qt 
implicitly shared datatypes).

2) Please re-run the count also for the QVector instances in Creator. 
Under the "typical use" defined above, they could be converted to 
QLists, couldn't they? How much would the totals change?

3) How many datatypes are defined by Creator (so they could be 
considered "user defined", for the purposes of this exercise; although I 
believe Creator also employs pimpl quite aggressively to keep BC, or am 
I wrong here?), and how many are coming from Qt?

The confirmation bias comes from the fact that Qt datatypes are designed 
to work nicely with QList; this for at least two good reasons:

a) Qt value types are normally pimpled (because of refcounting, or ABI 
stability) so they fit exactly in QList's array slots, but user-defined 
types aren't necessarily pimpled. Thus the bet: they're almost always 
the wrong size for QList.

b) We've spent an awful amount of time reviewing Qt's source code and 
tagging down every value type as movable (where it made sense). And I'd 
like to extend a big thank you to Sérgio for clazy, Marc for tirelessly 
fixing the code, Thiago for giving us relocatable types before Qt 6 in 
QVector-but-not-QList-as-that's-BIC. Users simply forget the dreaded 
typeinfo macro and so, for their types, QList is always in 
array-of-pointers mode no matter the size of the datatype.

This is the area I fundamentally consider the "original sin" more than 
anything else: QList can be a terrible default choice for end users and 
their datatypes, while it's possibly still a good one for Qt and its 
datatypes.

So what, we use it, then users can choose whatever they want? Sounds 
nice in theory, but in practice the choice of QList percolates down into 
user code -- at least up to the "UI layer", if not even further down. Qt 
examples themselves contain code using QList over "wrong" user types!

(And I didn't even start explore the teachability "fun" coming with this.)

In conclusion, if QList is a good choice for Qt's own types, and even if 
we then expose it at the API level as the Qt container of choice, its 
only advantage then over QVector is going to be prepending? How often is 
that a use case to justify the semantic burden of this extra container?

> This high percentage differs from the artificial benchmark scenarios not
> by accident. Almost all of these lists are lists of pointers, a natural
> ingredient of tree-like structures, common in user interfaces.

Lists of _actual_ pointers, or lists of refcounted types? That's why I 
thoughts types were interesting to know. If it's the latter, unless 
these types are changing in Qt 6 (like QString), the only difference 
here w.r.t. using QVector would be prepending?

> Overall there are 333 different item types, grouped by size of type there is:
> 
>           7    <  8    #  2.1 %  of types
>         122   ==  8    # 36.6 %  of types
>         204    >  8    # 61.3 %  of types
> 
> 
> 
> What I see here is a general-purpose random-access container with cheaper
> insertion and deletion at front and in the middle than *vector provides for
> 61.3% of the types, 

This cannot be claimed as a closed result: for insertion, it's ignoring 
the cost of the individual allocation of the newly inserted item, that 
needs to be traded off the moving of more bytes in memory.

Thanks for the scientific approach,
-- 
Giuseppe D'Angelo | giuseppe.dangelo at kdab.com | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - The Qt, C++ and OpenGL Experts

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4329 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.qt-project.org/pipermail/development/attachments/20200428/571eca25/attachment-0001.bin>