tags:

views:

82

answers:

2

I use some identity classes/structs that contains 1-2 ints, maybe a datetime or a small string as well. I use these as keys in a dictionary.

What would be a good override of GetHashCode for something like this? Something quite simple but still somewhat performant hopefully.

Thanks

A: 

Take a look into Essential C#.

It contains a detailed description on how to overwrite GetHashCode() correctly.

Extract from the book

The purpose of the hash code is to efficiently balance a hash table by generating a number that corresponds to the value of an object.

  • Required: Equal objects must have equal hash codes (if a.Equals(b), then a.GetHashCode() == b.GetHashCode())
  • Required: GetHashCode()'s returns over the life of a particular object should be constant (the same value), even if the object's data changes. In many cases, you should cache the method return to enforce this.
  • Required: GetHashCode() should not throw any exceptions; GetHashCode() must always successfully return a value.
  • Performance: Hash codes should be unique whenever possible. However, since hash code return only an int, there has to be an overlap in hash codes for objects that have potentially more values than an int can hold -- virtually all types. (An obvious example is long, since there are more possible long values than an int could uniquely identify.)
  • Performance: The possible hash code values should be distributed evenly over the range of an int. For example, creating a hash that doesn't consider the fact that distribution of a string in Latin-based languages primarily centers on the initial 128 ASCII characters would result in a very uneven distribution of string values and would not be a strong GetHashCode() algorithm.
  • Performance: GetHashCode() should be optimized for performance. GetHashCode() is generally used in Equals() implementations to short-circuit a full equals comparison if the hash codes are different. As a result, it is frequently called when the type is used as a key type in dictionary collections.
  • Performance: Small differences between two objects should result in large differences between hash codes values -- ideally, a 1-bit difference in the object results in around 16 bits of the hash code changing, on average. This helps ensure that the hash table remains balanced no matter how it is "bucketing" the hash values.
  • Security: It should be difficult for an attacker to craft an object that has a particular hash code. The attack is to flood a hash table with large amounts of data that all hash to the same value. The hash table implementation then becomes O(n) instead of O(1), resulting in a possible denial-of-service attack.

As already mentioned here you have also to consider some points about overriding Equals() and there are some code examples showing how to implement these two functions.

So these informations should give a starting point but i recommend to buy the book and to read the complete chapter 9 (at least the first twelve sides) to get all the points on how to correctly implement these two crucial functions.

Oliver
Could you include some insight from the book perhaps?
NickLarsen
-1. Since when is buying a 1000-page book an answer? I'm sure it's a great book, but this answer is like "google it" or "RTFM".
Jakob
@Jakob: The problem about `GetHashCode()` is, that it is more complicated than you would think of in the first place. Instead you have to consider a lot of points to get it to work flawless and i think it's just to much to write it down here in a few words.
Oliver
So, made an update to give some insight. Hopefully this leads to reverting the downvotes. ;-)
Oliver
The book is wrong. If the object defines equality in such a way that it can change over the course of its lifetime then the hashcode MUST change to match. It's not a good idea to let keys change, but that's not the class-creators responsibility to ensure (though if the class is immutable they may certainly indicate that this makes it a good key choice). It mustn't change if identity is the equality-definer (as default) but then the default impl. is best anyway.
Jon Hanna
It's also overstating the security aspect. This is only possible if outside input can influence the values hashed, in which case it may be necessary to override the default hash with an equality comparer. To truly guarantee security here (rather than rely on obscurity) requires more expense than can be justified in all cases.
Jon Hanna
+1  A: 

The accepted answer to this SO question is the technique I use.

http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode

Matt Davis