[Development] Views

Simon Hausmann Simon.Hausmann at qt.io
Thu Jun 6 09:47:31 CEST 2019


Am 06.06.19 um 09:05 schrieb Mutz, Marc via Development:
> On 2019-06-06 08:24, Joerg Bornemann wrote:
>> On 6/5/19 5:49 PM, Mutz, Marc via Development wrote:
>>
>>> As a library implementer, you are simply not _allowed_ the freedom to
>>> use a convenient tool over the most efficient one. That is, to put it
>>> mildly, a disservice to users and a disgrace to the profession of
>>> programmers. 8KiB just to look up a pointer in a map of {string, int}?
>>> That's 1/4th of the icache size of many processors!
>> [...]
>>
>> While I agree with this in general... every time we use a sorted vector
>> as a dictionary replacement we're scattering implementation details all
>> over the place, creating code that's much harder to read and easier to
>> make mistakes in (*).
>>
>> Maybe it's time for a general purpose dictionary class based on a sorted
>> vector?
>
> FTR: More of the same: 
> https://codereview.qt-project.org/c/qt/qtbase/+/264128 and 
> https://codereview.qt-project.org/c/qt/qtbase/+/264129
>
> I very strongly object to the notion that a find_if or lower_bound is 
> harder to read. More code does _not_ equate to less readable, as Qt 
> over and over has shown. There are different patterns involved than in 
> using Qt containers, sure, and by all means, if find_if frightens you, 
> then use a raw for loop, but this stuff is not rocket science. And 
> depending on how you define 'mistakes', it's just as easy to make a 
> mistake by forgetting qAsConst() on a Qt container than it is to, say, 
> combine iterators from different containers.
>
> As for QSortedVector/QFlatMap. There's a reason there's none in the 
> std, yet, and it has to do with when to sort. In one of the patches 
> above, we don't need to sort at all, because there're only ever O(10) 
> elements in there. Sorting, as performed by a QFlatMap would be 
> overkill there. In another, we don't even store the key, as it's equal 
> to the position of the element in the array. Sorting the key would be 
> nonsense. Oftem, you populate the data structure once and then only 
> perform lookups. In that case, a QFlatMap would waste time sorting 
> while you don't need it. So, yes, by all means, let's have a QFlatMap, 
> but it would just be another over-complicated container that people 
> misuse. Let's, as a community, learn how to use a raw vector (or 
> array) first, then introduce convenience.
>
> Don't pick a container by it's API. Pick it by how you use it. No-one 
> would use a RB tree for O(10) items if he had to implement it himself. 
> You wouldn't even implement one for O(1M) elements if insertions in 
> the middle are very infrequent. You are CS engineers. What would Knuth 
> say if he caught you using a RB tree with a static maximum of 10 
> entries? There's a reason for this. It's horribly slow, and only used 
> because in very limited circumstances, evenry other container is 
> _even_ slower. It's a very, very, complex beast. Just because the 
> compiler writes it for you at nothing more than a mention of 
> QMap<T,V>, doesn't mean it's less complex. And that complexity doesn't 
> go away just because you wrap it in a nice API: the compiler has a 
> hard time with it, and so does the CPU, as evidenced by the O(KiB) 
> savings involved in each replacement of a QMap with a vector. Let's 
> also not forget the memory overhead: a QMap<int, int> uses at least 24 
> bytes of of storage per element. Plus allocation overhead. Some 
> platforms (Windows? At some point at least?) didn't hand out heap in 
> less than 64 bytes. That's 64 bytes of memory for 8 bytes of payload. 
> A vector uses exactly 8. So a map uses anywhere between 3x and 8x more 
> memory.
>
> Just ask yourself: if you didn't have QMap/std::map or the hashed 
> versions, what data structure would you use? If the answer _actually_ 
> is "a RB tree (because I really need to insert and remove and lookup 
> roughly the same number of times", then fine, go use QMap. If it 
> _actually_ is "a hash table", then consider QHash/unorderd_map. Or 
> maybe an Open Addressing hash table would be better? BTW: with vector, 
> you can even implement a Heap (std::make/push/pop_heap).
>
> There's no replacement for thinking here. There's no data structure 
> that will work best in all cases.
>

You present very good technical arguments against the use of an RB tree 
in situations where it degrades code size and performance, which is 
something that I feel the majority of participants in this discussion 
agree with.


However I don't find your arguments that find_if/lower_bound is not 
harder to read convincing. I continue to agree with Joerg and Tor Arne 
and feel that the API of the associative containers results in code that 
is more compact, visually less noisy and consequently easier to read.


Simon



More information about the Development mailing list