tags:

views:

135

answers:

3

I have a vector class with hashCode() implemented. It wasn't written by me, but uses 2 prime numbers by which to multiply the 2 vector components before XORing them. Here it is:

    /*class Vector2f*/
...
    public int hashCode()
    {
        return 997 * ((int)x) ^ 991 * ((int)y); //large primes! 
    }

...As this is from an established Java library, I know that it works just fine.

Then I have a Boundary class, which holds 2 vectors, "start" and "end" (representing the endpoints of a line). The values of these 2 vectors are what characterize the boundary.

    /*class Boundary*/
...
    public int hashCode()
    {
        return 1013 * (start.hashCode()) ^ 1009 * (end.hashCode());
    }

Here I have attempted to create a good hashCode() for the unique 2-tuple of vectors (start & end) constituting this boundary. My question: Is this hashCode() implementation going to work?

(Note that I have used 2 different prime numbers in the latter hashCode() implementation; I don't know if this is necessary but better to be safe than sorry when trying to avoid common factors, I guess -- since I presume this is why primes are popular for hashing functions.)

+1  A: 

That's the normal practice. It looks pretty reasonable to me. If you're using Eclipse, you should find that it can generate equals and hashCode for you—just check the Source menu. It will do the same thing—enumerate your fields and create an equals method that checks all of them, then choose n prime numbers and do what you've done to create a hashCode method.

Samir Talwar
Thanks Samir, I didn't know that Eclipse had this.
Nick Wiggill
+1  A: 

The reason for using prime numbers (they don't necessarily have to be "large" prime numbers) is indeed to avoid common factors.

Hash codes are used by hash-based collection classes such as HashSet and HashMap. They work best if the hash codes of objects in the map are as dissimilar as possible (they have to do more work to distinguish objects if the hash code of those objects is the same).

Multiplying the hash codes of the parts that you use to make a combined hash code with primes ensures that the parts will not have common factors, so there's less chance of collisions (with regard to the hash codes of different parts overlapping each other).

Jesper
Just so I get this right, is it then recommended that I do use a *different pair* of primes in Boundary.hashCode() as compared with the pair I use in the underlying Vector2f.hashCode() implementation? Or would it be all the same if I used 997 and 991 in both? AFAIK, since the XOR is twiddling bits, the integer resulting from that XOR isn't necessarily going to be a prime (or a non-prime) anyway, meaning it shouldn't matter if I use the same pair of primes?
Nick Wiggill
I'm not entirely sure; if + were used instead of XOR, then it would be clear - you should use different primes. So I'd do that just to be safe. Note that XOR is almost like adding numbers, but without carrying bits (i.e. `1 ^ 1 = 0` and not `10`, carrying a bit forward). I haven't thought through how that exactly will affect the result.
Jesper
@Nick Wiggill I believe the important thing is that the prime being multiplied by not be a factor of the size of the hashTable's underlying array. If you multiply by 17, and the array's size is 34, then all objects will end up at position 0 or 17 (causing lots of collisions).
ILMTitan
@Jesper There is one main danger of using ^ instead of +. Repeating the same element in a series of ^ will cancel each other out. (N ^ Y ^ N = Y) This is mitigated by the fact that it is being multiplied at every step.
ILMTitan
A: 

You can find a sample chapter of Effective Java here which gives a great examples for creating hashCodes, using primes etc.

krock