[Interest] QMap operator<() overload.

Thu Feb 26 08:33:33 CET 2015

K. Frank schreef op 25-2-2015 om 16:46:
> Hello Kai and Torgeir!
>
> Thank you for your insightful comments.
>
> I have some follow-up questions, in line, below.  (And sorry
> for moving the discussion away from Qt.)
>
> On Wed, Feb 25, 2015 at 9:45 AM, Torgeir Lilleskog
> <torgeir.lilleskog at gmail.com> wrote:
>> I like this talk by Chandler Carruth:
>> https://www.youtube.com/watch?v=fHNmRkzxHWs
>>
>> He says something along the lines of "I don't think I've ever seen a problem
>> for which std::map is the right data structure"
The problem is, I think, that QMap and QHash offer an API that is 
suitable for these lookups, while a QVector<QPair<T1,T2> > doesn't 
(insert std equivalents as required). That makes it way more attractive 
to use them for this type of problem, even if the underlying data 
structures don't really perform all that well in practice for many 
problems. So, the API matches the problem, but the data structure 
doesn't. And because not only performance, but also code reusability and 
readability count, the container with the API matching the task is 
chosen most often. I guess introducing a QMap or QHash like API over a 
simple vector-of-pairs would help mitigate that issue. It would be nice 
if it would allow some tuning, with things like doing a batch of inserts 
and only then re-sorting, or just saying you don't need the data to be 
sorted at all.

Also, people (including me, up until recently) are thrown off by these 
complexity notations. These are algorithmic complexities, but they 
little about the actual performance, and now much less than they used to 
back in times where memory and processor speeds were more on equal 
levels. An algorithm may be O(N), but the implementation running on your 
machine may still be faster than an implementation of a O(log(N)) or 
even an O(1) algorithm.
> I have come to prefer std::unordered_map (a hash-map) to
> std::map, *except* when I need the entries sorted for some
> reason other than look-up (e.g., iterating over them to process
> them in sorted order).
>
> I would speculate that std::map (or QMap) would be the right
> choice when:
>
>     1)  modifications (insertions / deletions) are mixed with access
>     (look-up)
>
> *and*
>
>     2)  there is need to iterate over the collection ins sorted order
>
> Does that sound right?
Could be, but there are more considerations, like space requirements, 
and how many and what types of modifications are required in relation 
with the accesses for instance.  It could still be faster to maintain a 
vector that is only sorted just before you actually need to iterate over 
it.
>> QMap is the same type of data structure as std::map.
>>
>> If your map is small and not expected to scale and you are comfortable with
>> the performance of a map for this data it will be OK. Measure if in doubt
>> :-)
>>
>> I think it is really good for programmers to be aware of the cost of using
>> data structures like linked lists/maps/hashes.
>> The costs/tradeoffs have really changed the last 10+ years or so. What used
>> to be a good data structure in the 90s may have absolutely terrible
>> performance today. And it is not just some percentage differences, sometimes
>> one may see 10x-50x performance difference from a change in data layout
>> alone.
>>
>> Best regards,
>> - Torgeir
>>
>> On Wed, Feb 25, 2015 at 2:19 PM, Koehne Kai <Kai.Koehne at theqtcompany.com>
>> wrote:
>>>> -----Original Message-----
>>>> From: interest-bounces+kai.koehne=theqtcompany.com at qt-project.org
>>>> [...]
>>>> Could you elaborate a little on your comment, and on for which use cases
>>>> QMap might not be the "right" abstraction?
>>>>
>>>> I've always thought that maps (and hash-maps, e.g. std::unordered_map)
>>>> were the right tool when you want to look up a value using a key.  Are
>>>> there
>>>> pitfalls in this use case?
>>> As usual with performance, it highly depends on how you use it ... Anyhow,
>>> this was recently discussed on the development at qt-project.org mailing ist:
>>>
>>> http://permalink.gmane.org/gmane.comp.lib.qt.devel/20044
>>>
>>>> Or are there other use cases where you often see people using maps when
>>>> something else would be more efficient?
>>> Citing from the above:
>>>
>>> " In the vast majority of cases, a sorted vector is much faster, due to
>>> locality-of-reference. A RB-tree is optimized for wildly mixing insertions,
>>> lookups, and removals. When you can batch updates and separate them from
>>> lookups, time-wise, then a sorted vector is usually preferrable."
> My take-away from this is that for a static (or semi-static) map,
> a binary search into a sorted vector (e.g., of pairs, as you mentioned
> in your earlier post) beats a sorted map (e.g., std::map).  But when
> modifications (insertions, removals) are mixed with access, the
> sorted map is likely to win out.
>
> Would this be your rule of thumb?
No, not for me. It would depend on more than that. Since reading up on 
issues like this recently (mainly due to the topic mentioned before), my 
first feeling is to avoid these elaborate data structures unless I am 
sure it really has a benefit. On the other hand: the API benefits would 
certainly be a factor as well for non-critical code sections.
> I would also assume that a hash-map (e.g., std::unordered_map)
> beats a sorted vector, even for access (except when there are a
> lot of hash collisions or you need a sorted data structure for other
> reasons).
Indeed, that may very well be true. The problem is that comparing 
objects like strings may be quite expensive, while comparing a hash-key 
is very cheap. Measuring helps (but it not easy, and not easy to 
generalize to other systems).

André