views:

59

answers:

2

Why is it necessary for a hash table's (the data structure) size to be a prime?

From what I understand, it assures a more even distribution but is there any other reason?

+3  A: 

from http://srinvis.blogspot.com/2006/07/hash-table-lengths-and-prime-numbers.html

If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering

  1. Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions of entries.

  2. Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.

Sam
Than I guess my understanding was right: Avoid clustering <=> Get a better distribution. Right? Thanks for the reference.
Olivier Lalonde
+1  A: 

Whatever hashfunction you use you get an integer. In order to map that to the hashtable usually you'd mod the integer with the size of the hashtable to make that value smaller than the size of the table in order to map it.

return hashVal % tableSize

I'm a bit lost from this point onwards but IIRC if tableSize is even, all entries will be even. Half your hashtable will never be populated.

Afwas
That's another good point. And I believe the reason for a prime is it reduces the risk of patterns (for example 10,20,30,40 which will all give 0 if tableSize=10) in the hashVal which might result in a uneven distribution like mentioned by @Sam.
Olivier Lalonde
347 % 20 is 7, which is not even.
cskilbeck