views:

1822

answers:

5

Does anyone know or have an idea on how the default implementation for GetHashCode() works? And does it handle structures, classes, arrays, etc efficiently and well enough?

Trying to decide under what cases I should pack my own and what cases I can safely rely on the default implementation to do well. Don't want to reinvent the wheel if possible.

A: 

Generally speaking, if you're overriding Equals, you want to override GetHashCode. The reason for this is because both are used to compare equality of your class/struct.

Equals is used when checking Foo A, B;

if (A == B)

Since we know the pointer isn't likely to match, we can compare the internal members.

Equals(obj o)
{
    if (o == null) return false;
    MyType Foo = o as MyType;
    if (Foo == null) return false;
    if (Foo.Prop1 != this.Prop1) return false;

    return Foo.Prop2 == this.Prop2;
}

GetHashCode is generally used by hash tables. The hashcode generated by your class should always be the same for a classes give state.

I typically do,

GetHashCode()
{
    int HashCode = this.GetType().ToString().GetHashCode();
    HashCode ^= this.Prop1.GetHashCode();
    etc.

    return HashCode;
}

Some will say that the hashcode should only be calculated once per object lifetime, but I don't agree with that (and I'm probably wrong).

Using the default implementation provided by object, unless you have the same reference to one of your classes, they will not be equal to each other. By overriding Equals and GetHashCode, you can report equality based on internal values rather than the objects reference.

Ben
The ^= approach is not a particularly good approach for generating a hash - it tends to lead to a lot of common/predictable collisions - for example if Prop1 = Prop2 = 3.
Marc Gravell
If the values are the same, I don't see a problem with the collision as the objects are equal. The 13 * Hash + NewHash seems interesting though.
Ben
Ben: try it for Obj1 { Prop1 = 12, Prop2 = 12 } and Obj2 { Prop1 = 13, Prop2 = 13 }
Tomáš Kafka
+11  A: 
namespace System {
    public class Object {
        [MethodImpl(MethodImplOptions.InternalCall)]
        internal static extern int InternalGetHashCode(object obj);

        public virtual int GetHashCode() {
            return InternalGetHashCode(this);
        }
    }
}

InternalGetHashCode is mapped to an ObjectNative::GetHashCode function in the CLR, which looks like this:

FCIMPL1(INT32, ObjectNative::GetHashCode, Object* obj) {  
    CONTRACTL  
    {  
        THROWS;  
        DISABLED(GC_NOTRIGGER);  
        INJECT_FAULT(FCThrow(kOutOfMemoryException););  
        MODE_COOPERATIVE;  
        SO_TOLERANT;  
    }  
    CONTRACTL_END;  

    VALIDATEOBJECTREF(obj);  

    DWORD idx = 0;  

    if (obj == 0)  
        return 0;  

    OBJECTREF objRef(obj);  

    HELPER_METHOD_FRAME_BEGIN_RET_1(objRef);        // Set up a frame  

    idx = GetHashCodeEx(OBJECTREFToObject(objRef));  

    HELPER_METHOD_FRAME_END();  

    return idx;  
}  
FCIMPLEND

The full implementation of GetHashCodeEx is fairly large, so it's easier to just link to the C++ source code.

To expand on that a little, here's an excerpt from the documentation:

The default implementation returns an index for the object determined by the common language runtime. The index is unique to an instance of an object within an AppDomain for an instance of the executing engine. However, because this index can be reused after the object is reclaimed during garbage collection, it is possible to obtain the same hash code for two different objects. Also, two objects that represent the same value have the same hash code only if they are the exact same object. This implementation is not particularly useful for hashing; therefore, derived classes should override GetHashCode.

So basically, if you're not familiar with the CLR source code, there's really no point in figuring out the exact implementation, because the best practice is to override GetHashCode anyway.

David Brown
See also http://stackoverflow.com/questions/1139767/object-gethashcode
ChrisW
That documentation quote must have come from a very early version. It is no longer written like this in current MSDN articles, probably because it is quite wrong.
Hans Passant
They changed the wording, yes, but it still says basically the same thing: "Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes."
David Brown
+1  A: 

The documentation for the GetHashCode method for Object says "the default implementation of this method must not be used as a unique object identifier for hashing purposes." and the one for ValueType says "If you call the derived type's GetHashCode method, the return value is not likely to be suitable for use as a key in a hash table.".

The basic data types like byte, short, int, long, char and string implement a good GetHashCode method. Some other classes and structures, like Point for example, implement a GetHashCode method that may or may not be suitable for your specific needs. You just have to try it out to see if it's good enough.

The documentation for each class or structure can tell you if it overrides the default implementation or not. If it doesn't override it you should use your own implementation. For any classes or structs that you create yourself where you need to use the GetHashCode method, you should make your own implementation that uses the appropriate members to calculate the hash code.

Guffa
I'd disagree that you should *routinely* add your own implementation. Simply, the vast majority of classes (in particular) will never be tested for equality - or where they are, the inbuilt reference equality is fine. In the (already rare) occasion of writing a struct, it would be more common, true.
Marc Gravell
@Marc Gravel: That's of course not what I meant it to say. I will adjust the last paragraph. :)
Guffa
+4  A: 

For a class, the defaults are essentially reference equality, and that is usually fine. If writing a struct, it is more common to override equality (not least to avoid boxing), but it is very rare you write a struct anyway!

When overriding equality, you should always have a matchine Equals() and GetHashCode() (i.e. for two values, if Equals() returns true they must return the same hash-code, but the converse is not required) - and it is common to also provide ==/!= operators, and often to implement IEquatable<T> too.

For generating the hash code, it is common to use a factored sum, as this avoids collisions on paired values - for example, for a basic 2 field hash:

int hash = 27;
hash = (13 * hash) + field1.GetHashCode();
hash = (13 * hash) + field2.GetHashCode();
return hash;

This has the advantage that:

  • the hash of {1,2} is not the same as the hash of {2,1}
  • the hash of {1,1} is not the same as the hash of {2,2}

etc - which can be common if just using an unweighted sum, or xor (^), etc.

Marc Gravell
A: 

Have a look at the comment I left on the article: http://stackoverflow.com/questions/763731/gethashcode-extension-method

Paul Westcott