tags:

views:

4486

answers:

7

This question comes out of the discussion on tuples.

I started thinking about the hash code that a tuple should have. What if we will accept KeyValuePair class as a tuple? It doesn't override the GetHashCode() method, so probably it won't be aware of the hash codes of it's "children"... So, run-time will call Object.GetHashCode(), which is not aware of the real object structure.

Then we can make two instances of some reference type, which are actually Equal, because of the overloaded GetHashCode() and Equals(). And use them as "children" in tuples to "cheat" the dictionary.

But it doesn't work! Run-time somehow figures out the structure of our tuple and calls the overloaded GetHashCode of our class!

How does it work? What's the analysis made by Object.GetHashCode()?

Can it affect the performance in some bad scenario, when we use some complicated keys? (probably, impossible scenario... but still)

Consider this code as an example:

namespace csharp_tricks
{
    class Program
    {
        class MyClass
        {
            int keyValue;
            int someInfo;

            public MyClass(int key, int info)
            {
                keyValue = key;
                someInfo = info;
            }

            public override bool Equals(object obj)
            {
                MyClass other = obj as MyClass;
                if (other == null) return false;

                return keyValue.Equals(other.keyValue);
            }

            public override int GetHashCode()
            {
                return keyValue.GetHashCode();
            }
        }

        static void Main(string[] args)
        {
            Dictionary<object, object> dict = new Dictionary<object, object>();

            dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 1), 1), 1);

            //here we get the exception -- an item with the same key was already added
            //but how did it figure out the hash code?
            dict.Add(new KeyValuePair<MyClass,object>(new MyClass(1, 2), 1), 1); 

            return;
        }
    }
}

Update I think I've found an explanation for this as stated below in my answer. The main outcomes of it are:

  • Be careful with your keys and their hash codes :-)
  • For complicated dictionary keys you must override Equals() and GetHashCode() correctly.
A: 

I don't have the book reference anymore, and I'll have to find it just to confirm, but I thought the default base hash just hashed together all of the members of your object. It got access to them because of the way the CLR worked, so it wasn't something that you could write as well as they had.

That is completely from memory of something I briefly read so take it for what you will.

Edit: The book was Inside C# from MS Press. The one with the Saw blade on the cover. The author spent a good deal of time explaining how things were implemented in the CLR, how the language translated down to MSIL, ect. ect. If you can find the book it's not a bad read.

Edit: Form the link provided it looks like

Object.GetHashCode() uses an internal field in the System.Object class to generate the hash value. Each object created is assigned a unique object key, stored as an integer,when it is created. These keys start at 1 and increment every time a new object of any type gets created.

Hmm I guess I need to write a few of my own hash codes, if I expect to use objects as hash keys.

Dan Blair
This explanation contradicts with the code example in the question.
Yacoder
+5  A: 

This is an excellent article on GetHashCode from Effective C#: http://www.awprofessional.com/content/images/0321245660/items/wagner_item10.pdf

torial
Interesting article, but it obviously contains wrong explanation of how Object.GetHashCode works. If it would be as described in the article there wouldn't happen the exception...
Yacoder
The GetHashCode example implementation is trivial to the point of uselessness. What if more than one field participate in the equality of the object? This seems like the more common situation to me.
Chris Ammerman
@Turbulent: In that case you should use XOR operation (^), see other answers for the question
Yacoder
+7  A: 

Don't override GetHashcode() and Equals() on mutable classes, only override it on immutable classes or structures, else if you modify a object used as key the hash table won't function properly anymore (you won't be able to retrieve the value associated to the key after the key object was modified)

Also hash tables don't use hashcodes to identify objects they use the key objects themselfes as identifiers, it's not required that all keys that are used to add entries in a hash table return different hashcodes, but it is recommended that they do, else performance suffers greatly.

Pop Catalin
Nice, but it's not an answer.
Yacoder
But how could they idenfity the object without generating a hash? Isn't that the point of the GetHashCode?
Cory R. King
Cory, the hashcode of a key is a number used to quickly compute the location of a bucket whithin the hashtable, (a bucket is a key value pair, or more kv pairs), after the location is computed if the bucket contains the key (equality check)...
Pop Catalin
... (if the key equals the key from that bucket) then the value is returned, else the position is rehashed and another location is probed, The algorithm ends when the bucket probed is empty or all buckets have been probed.
Pop Catalin
So, a hash code doesn't indicate the the value, but the fist position in hash table to be searched for the key.
Pop Catalin
+2  A: 

Check out this post by Brad Abrams and also the comment by Brian Grunkemeyer for some more information on how object.GetHashCode works. Also, take a look at the first comment on Ayande's blog post. I don't know if the current releases of the Framework still follow these rules or if they have actually changed it like Brad implied.

Scott Dorman
These explanations contradict with the given code as well. Somehow run-time gets inside MyClass.GetHashCode() and uses it to obtain KeyValuePair hash code. But what exactly the run-time doing?
Yacoder
A: 
Cory R. King
KeyValuePair doesn't implement GetHashCode. Your example is TOTALLY wrong. Open MSDN: If two objects compare as equal, the GetHashCode method for each object must return the same value.
Yacoder
+1  A: 

It seems that I have a clue now.

I thought KeyValuePair is a reference type, but it is not, it is a struct. And so it uses ValueType.GetHashCode() method. MSDN for it says: "One or more fields of the derived type is used to calculate the return value".

If you will take a real reference type as a "tuple-provider" you'll cheat the dictionary (or yourself...).

using System.Collections.Generic;

namespace csharp_tricks
{
    class Program
    {
        class MyClass
        {
            int keyValue;
            int someInfo;

            public MyClass(int key, int info)
            {
                keyValue = key;
                someInfo = info;
            }

            public override bool Equals(object obj)
            {
                MyClass other = obj as MyClass;
                if (other == null) return false;

                return keyValue.Equals(other.keyValue);
            }

            public override int GetHashCode()
            {
                return keyValue.GetHashCode();
            }
        }

        class Pair<T, R>
        {
            public T First { get; set; }
            public R Second { get; set; }
        }

        static void Main(string[] args)
        {
            var dict = new Dictionary<Pair<int, MyClass>, object>();

            dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 2) }, 1);

            //this is a pair of the same values as previous! but... no exception this time...
            dict.Add(new Pair<int, MyClass>() { First = 1, Second = new MyClass(1, 3) }, 1);

            return;
        }
    }
}
Yacoder
+1  A: 

Here are the proper Hash and equality implementations for the Quad tuple (contains 4 tuple components inside). This code ensures proper usage of this specific tuple in HashSets and the dictionaries.

More on the subject (including the source code) here.

Note usage of the unchecked keyword (to avoid overflows) and throwing NullReferenceException if obj is null (as required by the base method)

public override bool Equals(object obj)
{
 if (ReferenceEquals(null, obj))
  throw new NullReferenceException("obj is null");
 if (ReferenceEquals(this, obj)) return true;
 if (obj.GetType() != typeof (Quad<T1, T2, T3, T4>)) return false;
 return Equals((Quad<T1, T2, T3, T4>) obj);
}

public bool Equals(Quad<T1, T2, T3, T4> obj)
{
 if (ReferenceEquals(null, obj)) return false;
 if (ReferenceEquals(this, obj)) return true;
 return Equals(obj.Item1, Item1)
  && Equals(obj.Item2, Item2)
   && Equals(obj.Item3, Item3)
    && Equals(obj.Item4, Item4);
}

public override int GetHashCode()
{
 unchecked
 {
  int result = Item1.GetHashCode();
  result = (result*397) ^ Item2.GetHashCode();
  result = (result*397) ^ Item3.GetHashCode();
  result = (result*397) ^ Item4.GetHashCode();
  return result;
 }
}
public static bool operator ==(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
 return Equals(left, right);
}


public static bool operator !=(Quad<T1, T2, T3, T4> left, Quad<T1, T2, T3, T4> right)
{
 return !Equals(left, right);
}
Rinat Abdullin
+1 for the correct implementation of GetHashCode, however I believe you shouldn't be throwing exception from Equals(object) - this implementation is also inconsistent: quad.Equals(null) throws NullReferenceException while quad.Equals((Quad)null) returns false :-).
Milan Gardian