[Development] Views

Thu Jun 6 09:05:31 CEST 2019

On 2019-06-06 08:24, Joerg Bornemann wrote:
> On 6/5/19 5:49 PM, Mutz, Marc via Development wrote:
> 
>> As a library implementer, you are simply not _allowed_ the freedom to
>> use a convenient tool over the most efficient one. That is, to put it
>> mildly, a disservice to users and a disgrace to the profession of
>> programmers. 8KiB just to look up a pointer in a map of {string, int}?
>> That's 1/4th of the icache size of many processors!
> [...]
> 
> While I agree with this in general... every time we use a sorted vector
> as a dictionary replacement we're scattering implementation details all
> over the place, creating code that's much harder to read and easier to
> make mistakes in (*).
> 
> Maybe it's time for a general purpose dictionary class based on a 
> sorted
> vector?

FTR: More of the same: 
https://codereview.qt-project.org/c/qt/qtbase/+/264128 and 
https://codereview.qt-project.org/c/qt/qtbase/+/264129

I very strongly object to the notion that a find_if or lower_bound is 
harder to read. More code does _not_ equate to less readable, as Qt over 
and over has shown. There are different patterns involved than in using 
Qt containers, sure, and by all means, if find_if frightens you, then 
use a raw for loop, but this stuff is not rocket science. And depending 
on how you define 'mistakes', it's just as easy to make a mistake by 
forgetting qAsConst() on a Qt container than it is to, say, combine 
iterators from different containers.

As for QSortedVector/QFlatMap. There's a reason there's none in the std, 
yet, and it has to do with when to sort. In one of the patches above, we 
don't need to sort at all, because there're only ever O(10) elements in 
there. Sorting, as performed by a QFlatMap would be overkill there. In 
another, we don't even store the key, as it's equal to the position of 
the element in the array. Sorting the key would be nonsense. Oftem, you 
populate the data structure once and then only perform lookups. In that 
case, a QFlatMap would waste time sorting while you don't need it. So, 
yes, by all means, let's have a QFlatMap, but it would just be another 
over-complicated container that people misuse. Let's, as a community, 
learn how to use a raw vector (or array) first, then introduce 
convenience.

Don't pick a container by it's API. Pick it by how you use it. No-one 
would use a RB tree for O(10) items if he had to implement it himself. 
You wouldn't even implement one for O(1M) elements if insertions in the 
middle are very infrequent. You are CS engineers. What would Knuth say 
if he caught you using a RB tree with a static maximum of 10 entries? 
There's a reason for this. It's horribly slow, and only used because in 
very limited circumstances, evenry other container is _even_ slower. 
It's a very, very, complex beast. Just because the compiler writes it 
for you at nothing more than a mention of QMap<T,V>, doesn't mean it's 
less complex. And that complexity doesn't go away just because you wrap 
it in a nice API: the compiler has a hard time with it, and so does the 
CPU, as evidenced by the O(KiB) savings involved in each replacement of 
a QMap with a vector. Let's also not forget the memory overhead: a 
QMap<int, int> uses at least 24 bytes of of storage per element. Plus 
allocation overhead. Some platforms (Windows? At some point at least?) 
didn't hand out heap in less than 64 bytes. That's 64 bytes of memory 
for 8 bytes of payload. A vector uses exactly 8. So a map uses anywhere 
between 3x and 8x more memory.

Just ask yourself: if you didn't have QMap/std::map or the hashed 
versions, what data structure would you use? If the answer _actually_ is 
"a RB tree (because I really need to insert and remove and lookup 
roughly the same number of times", then fine, go use QMap. If it 
_actually_ is "a hash table", then consider QHash/unorderd_map. Or maybe 
an Open Addressing hash table would be better? BTW: with vector, you can 
even implement a Heap (std::make/push/pop_heap).

There's no replacement for thinking here. There's no data structure that 
will work best in all cases.

Thanks,
Marc