[Development] qHash / QHash changes

Mon Mar 19 09:01:06 CET 2012

Hi,
sorry for the long message.

It's a pity to see that the proposed (and +2'd by Lars) improvement to
qHash(QString) [1] has been abandoned.

While I don't want to get involved in a flame about what's the
Best String Hashing Algorithm Ever Devised By an Human Being™,
I still think there's some room for making that change to go in.

I would actually split that patch this way:

1) Fix the various places inside Qt that rely on qHash to never change its
output across Qt versions, as well as the ordering of elements in a given
QHash. I think that's a sane requirement not to rely on these things, given the
fact that the hashing function can change anytime for a good number of reasons
(speed, quality of the hash, security issues).

QHash can go even further (see below) and change its internal ordering even in
the *same program run* in a *nondeterministic way*. That is, the identical
sequence of operations on two identical QHashes (f.i. two empty ones) are
allowed to produce two QHashes with a different iteration order.

In preparing [1] Robin had to go around and fix this problems here and there
(last build failure is apparently due to QtDbus tests relying on a specific
QHash iteration ordering).

Therefore, the idea is to make all those places use a private hashing function
if they truly need a never-changing one (because they're storing its output
somewhere, etc.); and stop to rely on QHash stable ordering, but use a QMap
instead.

2) Document to end users that it cannot be assumed that qHash output or QHash
iteration order are stable, so people must not rely on that.

3) Finally change qHash(QString) to... well, anything. DJB31XA proved to be
widely used (Java), faster and generating less collisions than the current
implementation. This is IMHO a good reason for changing. With 1+2 in place this
change can be deferred to some other version, if someone feels that more
research is needed.

***

Second, I'd like to get a fix the 2003 attack [2] on hash tables recently
re-proposed [3] against many major language programming implementations and
libraries. I don't see any reason for not fixing that in 9 years now that we
have the chance. I've built an almost-SC change [4] (WIP: no docs, no way to
disable the randomization, etc.). Kudos to Olivier Goffart for letting my brain
figuring out the obvious. :)

Comments?

Cheers,
--
Giuseppe D'Angelo

[1] http://codereview.qt-project.org/19098
[2] http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003.pdf
[3] http://www.ocert.org/advisories/ocert-2011-003.html
[4] http://codereview.qt-project.org/20410