views:

428

answers:

4
+1  Q: 

Object.GetHashCode

+2  A: 

Two objects which co-exist in time in the same AppDomain may not have the same hash code, but an object which is created, prints out its hash code, then gets garbage collected may share a hash code with another object which is created later.

In fact, I'm not sure where that documentation comes from - the docs for object.GetHashCode show this:

The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.

Note the first part about there not being a guarantee of it being unique for different values.

As for the bit about it not being a good hash code - that doesn't feel particuarly significant to me. If you only want reference equality, I think you're fine to not override GetHashCode/Equals.

You may get some hash collisions, but IMO it's very unlikely to be significant. The docs could certainly be clearer (I suspect the part about the method not being used as a unique object identifier for hashing purposes is talking about hashing for security) but I think you should be fine.

Jon Skeet
Sigh... "beat by the skeet" again... :-)
McWafflestix
By now Jon Skeet answers should just be auto accepted by the SO software.
cfeduke
That documentation warns, "Providing a good hash function on a class can significantly affect the performance of adding those objects to a hash table." If you say that I'm fine to not override it, is the default Object.GetHashCode therefore a "good" hash function? So, it's fine to use an arbitrary class as the key of a Dictionary<,> even when the dictionary is large and accessed often?
ChrisW
A: 

You do not actually need to modify anything on a class which requires only reference equality.

Also, formally, that is not a good implementation since it has poor distribution. A hash function should have a reasonable distribution since it improves hash bucket distribution, and indirectly, performance in collections which use hash tables. As I said, that is a formal answer, one of the guidelines when designing a hash function.

kek444
What's poor about the distribution? If, to map hashes to buckets, you divide the hash by the number of buckets and use the remainder, it seems to me that my implementation would distribute the objects equally/evenly over all buckets.
ChrisW
That would be true if that was the hash bucket mapping algorithm. See http://www.concentric.net/~Ttwang/tech/inthash.htm for example. Quote: "*For a hash function, the distribution should be uniform. This implies when the hash result is used to calculate hash bucket address, all buckets are equally likely to be picked. In addition, similar hash keys should be hashed to very different hash results. Ideally, a single bit change in the hash key should influence all bits of the hash result.*"
kek444
+9  A: 

The most important property a hash code implementation must have is this:

If two objects compare as equal then they must have identical hash codes.

If you have a class where instances of the class compare by reference equality, then you do not need to override GetHashCode; the default implementation guarantees that two objects that are the same reference have the same hash code. (You're calling the same method twice on the same object, so of course the result is the same.)

If you have written a class which implements its own equality that is different from reference equality then you are REQUIRED to override GetHashCode such that two objects that compare as equal have equal hash codes.

Now, you could do so by simply returning zero every time. That would be a lousy hash function, but it would be legal.

Other properties of good hash functions are:

  • GetHashCode should never throw an exception

  • Mutable objects which compare for equality on their mutable state, and therefore hash on their mutable state, are dangerously bug-prone. You can put an object into a hash table, mutate it, and be unable to get it out again. Try to never hash or compare for equality on mutable state.

  • GetHashCode should be extremely fast -- remember, the purpose of a good hash algorithm is to improve the performance of lookups. If the hash is slow then the lookups can't be made fast.

  • Objects which do not compare as equal should have dissimilar hash codes, well distributed over the whole range of a 32 bit integer

Eric Lippert
Would it be true, or too optimistic, to read that as, "the default object.GetHashCode is a good implemention of GetHashCode for types which use reference equality, i.e. for classes and/or interfaces (but not structs) for which you have not overriden object.Equals; where 'good implementation' means good/speedy performance when using these types as the key of a Dictionary."
ChrisW
Yes, the default implementation of GetHashCode that we provide for you is a pretty good one.
Eric Lippert
+1  A: 

Question:

Is this true? It seems to me that two objects won't have the same hash code, because an object's code isn't reused until the object is garbage collected (i.e. no longer exists).

Two objects may share the same hash code, if it is generated by default GetHashCode implementation, because:

  1. Default GetHashCode result shouldn't be changed during object's lifetime, and default implementation ensures this. If it could change, such types as Hashtable couldn't deal with this implementation. That's because it's expected that default hash code is a hash code of unique instance identifier (even although there is no such identifier :) ).
  2. Range of GetHashCode values is range of integer (2^32).

Conclusion: It's enough to allocate 2^32 strongly-referenced objects to (must be easy on Win64) to reach the limit.

Finally, there is an explicit statement in object.GetHashCode reference in MSDN: The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.

Alex Yakunin