views:

23

answers:

2

I ran across the question below in an old exam. My answers just feels a bit short and inadequate. Any extra ideas I can look into or reasons I have overlooked would be great. Thanx

Consider the MAD method compression function, mapping an object with hash code i to element [(3i + 7)mod9027]mod6000 of the 6000-element bucket array. Explain why this is a poor choice of compression function, and how it could be improved.

I basically just say that the function could be improved by changing the value for p (or 9027) to an prime number and choosing an other constant for a (or 3) could also help.

A: 

if i is uniformly distributed over a large enough range, then (3i + 7)mod9027 will be evenly distributed over 0-9026, but then taking mod 6000 means two thirds of the hashes will be in the first half of the range (0 to 3026 and 6000 to 9026 inclusive), and one third in the second half (3037 to 5999 inclusive).

Pete Kirkham
+3  A: 

Rup's comment is essentially the correct answer. 3 and 9027 are both divisible by 3, so 3i + 7 maps onto only 1/3 of the range 0-9026. Then the mapping mod 6000 maps 2/3 of the values to the lower half. So bucket 1 will contain roughly 1/1500 of the values [if I've done the math right] rather than the 1/6000 you would want. Bucket 0 will be empty.

Brian
Thanks, all the answers just gave me that little bit of extra info and insight that i sought :)
Jacques