[Development] [SPAM] How bad QList really is
Giuseppe D'Angelo
giuseppe.dangelo at kdab.com
Tue Apr 28 09:56:35 CEST 2020
On 4/25/20 4:49 PM, André Pönitz wrote:
> Spam detection software, running on the system "mx.qt-project.org",
> has identified this incoming email as possible spam. The original
[*SIGHS*]
> We all know the story that began with
>
> "We knew for a long time that QList is not a good default
> container, despite what the documentation claims. The problem
> boils down to the fact that for a lot of types T, QList<T> is
> needlessly inefficient by allocating elements on the heap and
> storing pointers to them instead of storing the elements
> in-place, like e.g. QVector does. Sometimes, for large and
> complex types, that might be exactly what you want, but a
> conservative estimate would put that chance at less than 5%." [1]
>
>
> I was curious how this "conservative" estimate of "less than 5%"
> manifests in real QList object instances in a real world example.
> My random picks of real world examples usually end up with "Loading
> the Qt Creator project in Qt Creator itself", and "statistics" is
> something I make up myself, in this case it is looking at QList objects
> at destruction time, and sorting them a bit according to size of
> the items in the list, number of items still present at that time etc.
>
> Besides being simple to do it's rather close to the typical use I see,
> where containers are filled up, accessed/used a few times and destroyed
> without much intermediate size changes.
>
>
> Here is what I get:
>
> * total QList objects created : 51.631.363
> - destructors that only bump ref : 38.015.673 73.6%
> - destructors that actually dealloc : 13.615.690 26.4%
>
>
> The 13.615.690 instances with actual deallocations ordered by the
> size of the item type:
>
> size occurences
>
> 4 1.656 # 0.01 % with internal padding
>
> 8 13.424.228 # 98.59 % objects with ideal size,
>
>
> 12 3 \
> 16 74.979 |
> 24 60.560 |
> 28 126 |
> 32 22.358 |
> 40 2.054 |
> 48 18.786 |
> 56 71 > # 1.40 % with indirection
> 64 3 |
> 72 46 |
> 80 3.484 |
> 96 1 |
> 112 7.264 |
> 128 52 |
> 184 2 |
> 2112 17 /
>
>
> From the 8-byte objects we have
>
> 13.420.883 stored directly # 99.975 % of cases
> 3.345 stored indirectly # 0.025 %
>
> i.e. in 98.57% of all cases, QList objects behave optimally.
Before we enter confirmation bias here: could you also offer a breakdown
of the actual types involved? Basically, there's some important things
to discuss here:
1) First and foremost, in Qt 6 QString, QByteArray, QVector, are bigger
than a pointer (3 times a pointer size). So, in Qt 6:
* either QList stays unchanged, and now we heap allocate each element
for those cases too (thus it's necessary to know how the above
statistics change); or
* QList gets adapted so that its internal array allocates 3 *
sizeof(void*) per element, so that e.g. Q6StringList won't require a
per-item allocation. This would then also avoid allocations for other
datatypes, e.g. QStringView, QImage, maybe QVariant and QColor; but of
course waste a ton of space for the ones which remain small (most of Qt
implicitly shared datatypes).
2) Please re-run the count also for the QVector instances in Creator.
Under the "typical use" defined above, they could be converted to
QLists, couldn't they? How much would the totals change?
3) How many datatypes are defined by Creator (so they could be
considered "user defined", for the purposes of this exercise; although I
believe Creator also employs pimpl quite aggressively to keep BC, or am
I wrong here?), and how many are coming from Qt?
The confirmation bias comes from the fact that Qt datatypes are designed
to work nicely with QList; this for at least two good reasons:
a) Qt value types are normally pimpled (because of refcounting, or ABI
stability) so they fit exactly in QList's array slots, but user-defined
types aren't necessarily pimpled. Thus the bet: they're almost always
the wrong size for QList.
b) We've spent an awful amount of time reviewing Qt's source code and
tagging down every value type as movable (where it made sense). And I'd
like to extend a big thank you to Sérgio for clazy, Marc for tirelessly
fixing the code, Thiago for giving us relocatable types before Qt 6 in
QVector-but-not-QList-as-that's-BIC. Users simply forget the dreaded
typeinfo macro and so, for their types, QList is always in
array-of-pointers mode no matter the size of the datatype.
This is the area I fundamentally consider the "original sin" more than
anything else: QList can be a terrible default choice for end users and
their datatypes, while it's possibly still a good one for Qt and its
datatypes.
So what, we use it, then users can choose whatever they want? Sounds
nice in theory, but in practice the choice of QList percolates down into
user code -- at least up to the "UI layer", if not even further down. Qt
examples themselves contain code using QList over "wrong" user types!
(And I didn't even start explore the teachability "fun" coming with this.)
In conclusion, if QList is a good choice for Qt's own types, and even if
we then expose it at the API level as the Qt container of choice, its
only advantage then over QVector is going to be prepending? How often is
that a use case to justify the semantic burden of this extra container?
> This high percentage differs from the artificial benchmark scenarios not
> by accident. Almost all of these lists are lists of pointers, a natural
> ingredient of tree-like structures, common in user interfaces.
Lists of _actual_ pointers, or lists of refcounted types? That's why I
thoughts types were interesting to know. If it's the latter, unless
these types are changing in Qt 6 (like QString), the only difference
here w.r.t. using QVector would be prepending?
> Overall there are 333 different item types, grouped by size of type there is:
>
> 7 < 8 # 2.1 % of types
> 122 == 8 # 36.6 % of types
> 204 > 8 # 61.3 % of types
>
>
>
> What I see here is a general-purpose random-access container with cheaper
> insertion and deletion at front and in the middle than *vector provides for
> 61.3% of the types,
This cannot be claimed as a closed result: for insertion, it's ignoring
the cost of the individual allocation of the newly inserted item, that
needs to be traded off the moving of more bytes in memory.
Thanks for the scientific approach,
--
Giuseppe D'Angelo | giuseppe.dangelo at kdab.com | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - The Qt, C++ and OpenGL Experts
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4329 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.qt-project.org/pipermail/development/attachments/20200428/571eca25/attachment-0001.bin>
More information about the Development
mailing list