[Development] qHash / QHash changes

Mon Mar 19 17:49:27 CET 2012

Hi all,

thanks everybody for your feedback!

2012/3/19 Thiago Macieira <thiago.macieira at intel.com>:
> On segunda-feira, 19 de março de 2012 10.42.44, Thiago Macieira wrote:
>> [1] note that the order is stable. Two hashing tables with the same
>> elements  must produce the same order.
>
> Actually, thinking a little more about this, I might be wrong.
>
> If two elements produce the same hashing value (be it a collision or because
> of multiple insertions of the same item), their order is arbitrary. So suppose
> that B and C produce the same hashing, the following two hashing tables are
> still equal:
>
>        A, B, C
>        A, C, B
>
> Moreover, two elements may be stored in the same bucket even if they have
> different hashing values. And taken to an extreme, a degenerate hashing table
> (all elements in the same bucket) could have any order and still be
> equivalent.
>
> Given those properties, I think we can safely say that relying on two hashing
> tables with the same elements to have the same order is not acceptable.

This summarizes very well a part of the problem.

There's also another issue, that is,  the "history" that led two
hashes to have the same contents. If the two hashes did rebucket the
contents (f.i. one grew and then some contents were deleted without
letting it shrink) then the two hashes are still equal -- contain the
very same elements --, but iterating on them gives different results.

Example:

#include <QtCore>

int main()
{
    QHash<int, int> h1, h2;
    h2.reserve(100);
    h1[67003] = h2[67003] = 0;
    h1[101449] = h2[101449] = 1;
    QHash<int, int>::const_iterator i;
    for(i = h1.constBegin(); i != h1.constEnd(); ++i)
        qDebug() << i.key() << i.value();
    for(i = h2.constBegin(); i != h2.constEnd(); ++i)
        qDebug() << i.key() << i.value();
}

This prints:

67003 0
101449 1
101449 1
67003 0

> I think that's the rule that I violated in the test.

What I think happened there was relying on a *specific* ordering of a
forward iteration. Changing the hash function broke the test: the
elements were all there, but appeared in a different order.

In short, any of

1) changing the hash function
2) changing the QHash implementation
3) a different "history" of a QHash object (compared to another one)
4) a randomly salted QHash

can lead to a different iteration order. 1) and 2) are provided by Qt
itself, and the point is whether the user can rely on them being
immutable across Qt versions (I think not). As a side-effect of 1),
storing qHash output becomes a very bad idea. 3) is dependent on what
the user does, and it is already not reliable.

4) is instead questioned when it comes to having a salted QHash
(second part of my mail): different executions of the very same
program are allowed to have a different iteration order (the currenly
submitted patch actually bends it too much -- every QHash gets a
random seed. I don't think this is needed at all.). I plan a way to
turn this "feature" off, through environment variables, f.i. for
debugging purposes. But having a nondeterministic QHash behaviour can
be really surprising, I admit that.

So, should I go on with the plan I devised or it's too late for this
discussion? :)

Cheers,
-- 
Giuseppe D'Angelo