[Development] Views
Mutz, Marc
marc at kdab.com
Thu Jun 6 09:05:31 CEST 2019
On 2019-06-06 08:24, Joerg Bornemann wrote:
> On 6/5/19 5:49 PM, Mutz, Marc via Development wrote:
>
>> As a library implementer, you are simply not _allowed_ the freedom to
>> use a convenient tool over the most efficient one. That is, to put it
>> mildly, a disservice to users and a disgrace to the profession of
>> programmers. 8KiB just to look up a pointer in a map of {string, int}?
>> That's 1/4th of the icache size of many processors!
> [...]
>
> While I agree with this in general... every time we use a sorted vector
> as a dictionary replacement we're scattering implementation details all
> over the place, creating code that's much harder to read and easier to
> make mistakes in (*).
>
> Maybe it's time for a general purpose dictionary class based on a
> sorted
> vector?
FTR: More of the same:
https://codereview.qt-project.org/c/qt/qtbase/+/264128 and
https://codereview.qt-project.org/c/qt/qtbase/+/264129
I very strongly object to the notion that a find_if or lower_bound is
harder to read. More code does _not_ equate to less readable, as Qt over
and over has shown. There are different patterns involved than in using
Qt containers, sure, and by all means, if find_if frightens you, then
use a raw for loop, but this stuff is not rocket science. And depending
on how you define 'mistakes', it's just as easy to make a mistake by
forgetting qAsConst() on a Qt container than it is to, say, combine
iterators from different containers.
As for QSortedVector/QFlatMap. There's a reason there's none in the std,
yet, and it has to do with when to sort. In one of the patches above, we
don't need to sort at all, because there're only ever O(10) elements in
there. Sorting, as performed by a QFlatMap would be overkill there. In
another, we don't even store the key, as it's equal to the position of
the element in the array. Sorting the key would be nonsense. Oftem, you
populate the data structure once and then only perform lookups. In that
case, a QFlatMap would waste time sorting while you don't need it. So,
yes, by all means, let's have a QFlatMap, but it would just be another
over-complicated container that people misuse. Let's, as a community,
learn how to use a raw vector (or array) first, then introduce
convenience.
Don't pick a container by it's API. Pick it by how you use it. No-one
would use a RB tree for O(10) items if he had to implement it himself.
You wouldn't even implement one for O(1M) elements if insertions in the
middle are very infrequent. You are CS engineers. What would Knuth say
if he caught you using a RB tree with a static maximum of 10 entries?
There's a reason for this. It's horribly slow, and only used because in
very limited circumstances, evenry other container is _even_ slower.
It's a very, very, complex beast. Just because the compiler writes it
for you at nothing more than a mention of QMap<T,V>, doesn't mean it's
less complex. And that complexity doesn't go away just because you wrap
it in a nice API: the compiler has a hard time with it, and so does the
CPU, as evidenced by the O(KiB) savings involved in each replacement of
a QMap with a vector. Let's also not forget the memory overhead: a
QMap<int, int> uses at least 24 bytes of of storage per element. Plus
allocation overhead. Some platforms (Windows? At some point at least?)
didn't hand out heap in less than 64 bytes. That's 64 bytes of memory
for 8 bytes of payload. A vector uses exactly 8. So a map uses anywhere
between 3x and 8x more memory.
Just ask yourself: if you didn't have QMap/std::map or the hashed
versions, what data structure would you use? If the answer _actually_ is
"a RB tree (because I really need to insert and remove and lookup
roughly the same number of times", then fine, go use QMap. If it
_actually_ is "a hash table", then consider QHash/unorderd_map. Or maybe
an Open Addressing hash table would be better? BTW: with vector, you can
even implement a Heap (std::make/push/pop_heap).
There's no replacement for thinking here. There's no data structure that
will work best in all cases.
Thanks,
Marc
More information about the Development
mailing list