ansaurus

Question

What is a sensible prime for hashcode calculation?

Answer 1

A:

You need to define your range for i and j. You could use a prime number for both.

public int hashCode() {
   http://primes.utm.edu/curios/ ;)
   return 97654321 * i ^ 12356789 * j;
}

Peter Lawrey 2009-12-02 21:52:51

Answer 2

+1 A:

Actually, if you take a prime so large that it comes close to INT_MAX, you have the same problem because of modulo arithmetic. If you expect to hash mostly strings of length 2, perhaps a prime near the square root of INT_MAX would be best, if the strings you hash are longer it doesn't matter so much and collisions are unavoidable anyway...

Pascal Cuoq 2009-12-02 21:54:16

Right, the modulo arithmetic makes the problem difficult and interesting. I think I'll write a little program to search for a good solution. :-)

hstoerr 2009-12-03 06:49:39

Answer 3

A:

I'd choose 7243. Large enough to avoid collissions with small numbers. Doesn't overflow to small numbers quickly.

ammoQ 2009-12-02 22:11:23

I use the first 1000 primes as a handy source for small prime numbers http://primes.utm.edu/lists/small/1000.txt

Steve Kuo 2009-12-03 01:51:05

I don't think overflowing matters - if the prime is large enough, the result will be large even after the overflow. I was thinking of something like 1327144003.

hstoerr 2009-12-03 06:16:51

Answer 4

A:

Collisions may not be such a big issue... The primary goal of the hash is to avoid using equals for 1:1 comparisons. If you have an implementation where equals is "generally" extremely cheap for objects that have collided hashs, then this is not an issue (at all).

In the end, what is the best way of hashing depends on what you are comparing. In the case of an int pair (as in your example), using basic bitwise operators could be sufficient (as using & or ^).

Romain 2009-12-02 23:20:52

Of course it does not matter much, but changing the prime it is an obvious and easy way to improve things. So why not do it?

hstoerr 2009-12-03 06:17:48

Agreed. I primarily meant to put a bit of emphasis on the fact using primes is not the *only* way of doing things, as the question ultimately has a very "generic" scope.

Romain 2009-12-03 08:14:57

Answer 5

+6 A:

To give a meaningful answer to this you have to know something about the possible values of i and j. The only thing I can think of in general is, that in many cases small values will be more common than large values. (The odds of 15 appearing as a value in your program are much better than, say, 438281923.) So it seems a good idea to make the smallest hashcode collision as large as possible by choosing an appropriate prime. For 31 this rather bad - already for i=-1 and j=31 you have the same hash value as for i=0 and j=0.

Since this is interesting, I've written a little program that searched the whole int range for the best prime in this sense. That is, for each prime I searched for the minimum value of Math.abs(i)+Math.abs(j) over all values of i,j that have the same hashcode as 0,0, and then took the prime where this minimum value is as large as possible.

Drumroll: the best prime in this sense is 486187739 (with the smallest collision being i=-25486, j=67194). Nearly as good and much easier to remember is 92821 with the smallest collision being i=-46272 and j=46016.

If you give "small" another meaning and want to be the minimum of Math.sqrt(i*i+j*j) for the collision as large as possible, the results are a little different: the best would be 1322837333 with i=-6815 and j=70091, but my favourite 92821 (smallest collision -46272,46016) is again almost as good as the best value.

I do acknowledge that it is quite debatable whether these calculation make much sense in practice. But I do think that taking 92821 as prime makes much more sense than 31, unless you have good reasons not to.

hstoerr 2010-05-12 07:26:44

You're looking for a magic number for a perfect hash, or a nearly perfect one at any rate. I'd be more interested in seeing a solution for arbitrary inputs up to the hash size (eg, 4 2-byte values in an 8 byte hashcode), than this particular case of simple transposition.

Jason 2010-05-12 21:20:54

8 byte hashcode? At least in Java this is 4 bytes. Anyway: you could just continue the scheme that is used in eclipse hashCode generation: result = prime * result + i; result = prime * result + j; and so forth. For this 92821 is probably a good choice as prime - at least much better than the eclipse default 31.

hstoerr 2010-05-18 08:53:14

ansaurus

tags:

views:

answers:

What is a sensible prime for hashcode calculation?

related questions