tags:

views:

2163

answers:

6

What integer hash function are good that accepts an integer hash key?

+3  A: 

Depends on how your data is distributed. For a simple counter, the simplest function

f(i) = i

will be good (I suspect optimal, but I can't prove it).

erikkallen
Proof by contradiction. Assume that bucket b has more than one key: k1 and k2 where k1 != k2. Then f(k1) = k1 = b and f(k2) = k2 = b, by transitivity k1 = k2. QED. Although this function means that the size of your hash table is the size of your input set which completely defeats the point of a hash
Tyler McHenry
@Tyler: The hash container will use f(i) % size to decide which bucket to put the value in. However, only f(i) is the hash function, the modulo operation belongs to the hash table. The object (integer) can't know which hash table it will be put in.
erikkallen
The problem with this is that it's common to have large sets of integers that are divisible by a common factor (word-aligned memory adresses etc.). Now if your hash table happens to be divisible by the same factor, you end up with only half (or 1/4, 1/8, etc.) buckets used.
Rafał Dowgird
@Rafal: That's why the response says "for a simple counter" and "Depends on how your data is distributed"
erikkallen
+4  A: 

This page lists some simple hash functions that tend to decently in general, but any simple hash has pathological cases where it doesn't work well.

Tyler McHenry
+1  A: 

The answer depends on a lot of things like:

  • Where do you intend to employ it?
  • What are you trying to do with the hash?
  • Do you need a crytographically secure hash function?

I suggest that you take a look at the Merkle-Damgard family of hash functions like SHA-1 etc


Thanks mmeyers! For some reason, since the past few days, I:

  • can't read post-comments
  • can't use hyperlinks properly
  • am forced to use CAPTCHA for every single post
dirkgently
+1  A: 

There's a nice overview over some hash algorithms at Eternally Confuzzled. I'd recommend Bob Jenkins' one-at-a-time hash which quickly reaches avalanche and therefore can be used for efficient hash table lookup.

Christoph
That is a good article, but it is focused on hashing string keys, not integers.
Adrian Mouat
Just to be clear, although the methods in the article would work for integers (or could be adapted to), I assume there are more efficient algorithms for integers.
Adrian Mouat
+2  A: 

Knuth's multiplicative method:

hash(i)=i*2654435761 mod 2^32

In general, you should pick a multiplier that is in the order of your hash size (2^32 in the example) and has no common factors with it. This way the hash function covers all your hash space uniformly.

Edit: The biggest disadvantage of this hash function is that it preserves divisibility, so if your integers are all divisible by 2 or by 4 (which is not uncommon), their hashes will be too. This is a problem in hash tables - you can end up with only 1/2 or 1/4 of the buckets being used.

Rafał Dowgird
It's a really bad hash function, albeit attached to a famous name.
Seun Osewa
+2  A: 
  • 32-bits multiplicative method (very fast) see @rafal

    #define hash32(x) ((x)*2654435761)
    #define H_BITS 24 // Hashtable size
    #define H_SHIFT (32-H_BITS)
    unsigned hashtab[1<<H_BITS]  
    .... 
    unsigned slot = hash32(x) >> H_SHIFT
    
  • 32-bits and 64-bits (good distribution) at : MurmurHash

  • Integer Hash Function
bill