views:

317

answers:

4

In .NET you need that Equals(object) and GetHashCode() are compatible. But sometimes you can't:

public class GreaterThan32Bits
{
    public int X { get; set; }
    public int Y { get; set; }
}

Because the data density is greater than 32 bits, and GetHashCode returns an Int32, you will have 3 solutions (assuming a correctly implemented GetHashCode):

  1. Avoid code duplication discarded as incorrect

    public override bool Equals(object other)
    {
        if(ReferenceEquals(null, other)) return false;
        if(ReferenceEquals(this, other)) return true;
        return this.GetHashCode() == other.GetHashCode();
    }
    
  2. Implement Equals separately from GetHashCode()

    public override bool Equals(object obj)
    {
        if(ReferenceEquals(null, other)) return false;
        if(ReferenceEquals(this, other)) return true;
        var other = obj as GreaterThan32Bits;
        if(this.X == other.X) return this.Y == other.Y;
        return false;
    }
    
  3. Implement a greater precision GetHashCode64, the overrided GetHashCode (32 bits) will return (int)GetHashCode64(), and Equals will return this.GetHashCode64() == other.GetHashCode64()

Which one would you implement?

The first solution is imprecise incorrect, but cleaner. The second option seems clean, but gets very complex when the class has more properties. The third option is a compromise.

+1  A: 

I think the key that you're missing is that GetHashCode() doesn't have to return unique values.

Its perfectly acceptable for two different objects to return the same GetHashCode. Say you add two objects to a HashSet that have the same HashCode, then the container will use the GetHashCode first to find where approximately in the HashSet the object is, then use the equals on all the matching objects to find your exact object.

Obviously its better if each object has a unique hash code. If every object returned the same hashCode then performance would be horrible.

Alex Black
Re-reading your question: I think you should just keep the GetHashCode and Equals methods separate. GetHashCode() should do exactly that: generate a hashcode. Equals should compare the two objects for equality.
Alex Black
+1 for answering the question with this comment
Jader Dias
+4  A: 

Your first implementation is not correct. The hash code of two objects might be equal even though the objects themselves are not equal: This is the essence of a hash code.

An object hash code may be useful for determining when two objects are not equal, but to determine whether they are equal you will have to call .Equals().

An implementation that always returns 0 for GetHashCode() is legal, but may not be very efficient when objects of that type are inserted int various types of containers.

Your option 2 is the best choice. It's a good idea to keep the implementation of Equals() separate from GetHashCode(), because they do quite different things. Equals() must return true if and only if the two objects are equal in all respects. To do this, you will normally have to check each object property individually.

Greg Hewgill
About the incorrectness: I know, that's why I called it imprecise.
Jader Dias
+2  A: 

Strictly speaking first solution doesn't work. It is not a solution then.

The idea of hashing is quite different. Int32 is pretty enough for that purposes.

The suggested GetHashCode() is

return X ^ Y;

Simply as it is.

EDIT: Equals methods then can use GetHashCode(), but only to return false when hashes differ. Deep comparison is required anyway.

modosansreves
The question isn't about GetHashCode, but about Equals
Jader Dias
+3  A: 

The requirement is as follows: if (a.Equals(b)), then a.GetHashCode() == b.GetHashCode()

Not the other way round.

You shouldn't implement Equals() in term of GetHashCode(), ever. It's perfectly valid for GetHashCode to have collisions, but Equals() must not return false positives.

I'd suggest this implementation:

public override int GetHashCode()
{
    return unchecked( this.X * p1 + this.Y * p2 );
}

public override bool Equals(object obj) 
{
    var other = obj as GreaterThan32Bits;
    // you must do the null test after the cast, otherwise the
    // function crashes when obj is not a GreaterThan32Bits instance
    if (ReferenceEquals(other, null)) return false;
    return this.X == other.X && this.Y == other.Y;
}

Where p1 and p2 are large primes. This usually results in a good hash function (few hash collisions -> Dictionary becomes efficient). If the X and Y values are independent (for example, you don't expect many points on a straight line like X=Y), then even something simple like X ^ Y can be a good hash function.

But then again, you only need a good hash function if you actually use the class as keys in a dictionary (or other hash table).

In fact, it's perfectly fine to always return 0 in GetHashCode() and only implement Equals(). Dictionary will still work correctly with such objects as keys, it'll just be inefficient.

Daniel
"It'll just be inefficient." Hugely inefficient. It devolves to a linear search.
Jim Mischel