views:

100

answers:

4

Sometimes you need to take a hash function of a pointer; not the object the pointer points to, but the pointer itself. Lots of the time, folks just punt and use the pointer value as an integer, chop off some high bits to make it fit, maybe shift out known-zero bits at the bottom. Thing is, pointer values aren't necessarily well-distributed in the code space; in fact, if your allocator is doing its job, there's an excellent chance they're all clustered close together.

So, my question is, has anyone developed hash functions that are good for this? Take a 32- or 64-bit value that's maybe got 12 bits of entropy in it somewhere and spread it evenly across a 32-bit number space.

+6  A: 

This page lists several methods that might be of use. One of them, due to Knuth, is a simple as multiplying (in 32 bits) by 2654435761, but "Bad hash results are produced if the keys vary in the upper bits." In the case of pointers, that's a rare enough situation.

Here are some more algorithms, including performance tests.

It seems that the magic words are "integer hashing".

Thomas
And when you search for "integer hashing", you get pointed to another SO page that this one effectively duplicates. :-)
Steven Sudit
Thanks. It didn't occur to me to search for "integer hashing" because I was stuck on the values being *pointers*, but those pages look very helpful.
Zack
But on a 32-bits system the upper bits of addresses can very well be in use...
Henk Holterman
A: 

Why not just use an existing hash function?

DShook
I suspect their motivation is speed.
Steven Sudit
+1  A: 

They'll likely exhibit locality, yes - but in the lower bits, which means objects will be distributed through the hashtable. You'll only see collisions if a pointer's address is a multiple of the hashtable's length from another pointer.

Nick Johnson
That's not my intuition. I'd expect a typical (32-bit) pointer into the heap to be of the form `CCCC XXX8` (hexadecimal) - high half constant or nearly so, *maybe* 12 bits of entropy in the low half, lowest nybble nigh-constant again. And the low half is likely to tick up by a number with a lot of twos in its prime factorization.
Zack
You already mentioned shifting the low bits out, though. If that's all the bits of entropy there are in the number, no amount of hashing is going to increase that, though.
Nick Johnson