ansaurus

Question

Answer 1

A:

If your input hashes are the same size, evenly distributed and not related to each other then an XOR should be OK. Plus it's fast.

The situation I'm suggesting this for is where you want to do

H = hash(A) ^ hash(B); // A and B are different types, so there's no way A == B.

of course, if A and B can be expected to hash to the same value with a reasonable (non-negligible) probability, then you should not use XOR in this way.

geofftnz 2009-10-29 21:57:11

how would I tell whether my hash codes are evenly distributed, is there an easy benchmark to do for this? I know the collision rate is pretty low but does that necessarily correspond to an even distribution?

RobV 2009-10-29 22:04:20

Answer 2

+7 A:

I would personally avoid XOR - it means that any two equal values will result in 0 - so hash(1, 1) == hash(2, 2) == hash(3, 3) etc. Also hash(5, 0) == hash(0, 5) etc which may come up occasionally. I have deliberately used it for set hashing - if you want to hash a sequence of items and you don't care about the ordering, it's nice.

I usually use:

int hash = 17;
hash = hash * 31 + firstField.GetHashCode();
hash = hash * 31 + secondField.GetHashCode();
return hash;

That's the form that Josh Bloch suggests in Effective Java. Last time I answered a similar question I managed to find an article where this was discussed in detail - IIRC, no-one really knows why it works well, but it does. It's also easy to remember, easy to implement, and easy to extend to any number of fields.

Jon Skeet 2009-10-29 22:11:04

Yeah that was my concern about XORing, in the type of data I'm pairing it's fairly unlikely to be pairing too equal items but not impossible

RobV 2009-10-29 22:21:23

Looks like Dan Bernstein's (or Chris Torek's) hash, just with different constants. Nobody knows why that works well either.

ephemient 2009-10-29 22:27:09

@RobV: I prefer not to have to think if I don't have to. I use this hash even when I *could* get away with XOR, just to avoid having to wonder whether it's safe or not :)

Jon Skeet 2009-10-29 22:48:08

what are the firstField and secondField that you talk about?

SuperString 2009-12-28 21:23:06

@SuperString: the first and second fields you want to hash. You then keep going for the other fields.

Jon Skeet 2009-12-28 21:46:37

A word of warning, this is (a variation of) the Berstein hash, and because nobody knows why it does well in tests it is not advisable when hashing is critical. See http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx. Also, you should wrap this code in an `unchecked { }` block. GetHashCode() should not throw any exceptions.

Henk Holterman 2009-12-31 16:38:57

Answer 3

A:

I would recommend using the built-in hash functions in System.Security.Cryptography rather than rolling your own.

richardtallent 2009-10-29 23:05:58

No, They have a very different purpose and break the rule that GetHashCode should be fast.

Henk Holterman 2009-10-30 10:20:00

Answer 4

A:

If you're looking for speed and don't have too many collisions, then XOR is fastest. To prevent a clustering around zero, you could do something like this:

finalHash = hash1 ^ hash2;
return finalHash != 0 ? finalHash : hash1;

Of course, some prototyping ought to give you an idea of performance and clustering.

ebpower 2009-10-30 00:34:09

ansaurus

tags:

views:

answers:

Quick and Simple Hash Code Combinations

related questions