views:

335

answers:

4

Can people recommend quick and simple ways to combine the hash codes of two objects. I am not too worried about collisions since I have a Hash Table which will handle that efficiently I just want something that generates a code quickly as possible.

Reading around SO and the web there seem to be a few main candidates:

  1. XORing
  2. XORing with Prime Multiplication
  3. Simple numeric operations like multiplication/division (with overflow checking or wrapping around)
  4. Building a String and then using the String classes Hash Code method

What would people recommend and why?

A: 

If your input hashes are the same size, evenly distributed and not related to each other then an XOR should be OK. Plus it's fast.

The situation I'm suggesting this for is where you want to do

H = hash(A) ^ hash(B); // A and B are different types, so there's no way A == B.

of course, if A and B can be expected to hash to the same value with a reasonable (non-negligible) probability, then you should not use XOR in this way.

geofftnz
how would I tell whether my hash codes are evenly distributed, is there an easy benchmark to do for this? I know the collision rate is pretty low but does that necessarily correspond to an even distribution?
RobV
+7  A: 

I would personally avoid XOR - it means that any two equal values will result in 0 - so hash(1, 1) == hash(2, 2) == hash(3, 3) etc. Also hash(5, 0) == hash(0, 5) etc which may come up occasionally. I have deliberately used it for set hashing - if you want to hash a sequence of items and you don't care about the ordering, it's nice.

I usually use:

int hash = 17;
hash = hash * 31 + firstField.GetHashCode();
hash = hash * 31 + secondField.GetHashCode();
return hash;

That's the form that Josh Bloch suggests in Effective Java. Last time I answered a similar question I managed to find an article where this was discussed in detail - IIRC, no-one really knows why it works well, but it does. It's also easy to remember, easy to implement, and easy to extend to any number of fields.

Jon Skeet
Yeah that was my concern about XORing, in the type of data I'm pairing it's fairly unlikely to be pairing too equal items but not impossible
RobV
Looks like Dan Bernstein's (or Chris Torek's) hash, just with different constants. Nobody knows why that works well either.
ephemient
@RobV: I prefer not to have to think if I don't have to. I use this hash even when I *could* get away with XOR, just to avoid having to wonder whether it's safe or not :)
Jon Skeet
what are the firstField and secondField that you talk about?
SuperString
@SuperString: the first and second fields you want to hash. You then keep going for the other fields.
Jon Skeet
A word of warning, this is (a variation of) the Berstein hash, and because nobody knows why it does well in tests it is not advisable when hashing is critical. See http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx. Also, you should wrap this code in an `unchecked { }` block. GetHashCode() should not throw any exceptions.
Henk Holterman
A: 

I would recommend using the built-in hash functions in System.Security.Cryptography rather than rolling your own.

richardtallent
No, They have a very different purpose and break the rule that GetHashCode should be fast.
Henk Holterman
A: 

If you're looking for speed and don't have too many collisions, then XOR is fastest. To prevent a clustering around zero, you could do something like this:

finalHash = hash1 ^ hash2;
return finalHash != 0 ? finalHash : hash1;

Of course, some prototyping ought to give you an idea of performance and clustering.

ebpower