views:

4390

answers:

6

In .NET System.Object.GetHashCode method is use in a lot of places throughout the .NET base class libraries. Especially when finding items in a collection fast or to determine equality. Is there a standard algorithm/ best practise on how to implement the GetHashCode override for my custom classes so I don't degrade performance?

+105  A: 

I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

public int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two int fields:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

EDIT: This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right.

Jon Skeet
The algorithm described in the book you mention is infact a little more detailed it especailly describes what to do for different data types of the fields. E.g.: for fields of type long use (int)(field ^ f >>> 32) instead of simply calling GetHashcode. Is long.GetHashCodes implemented that way ?
bitbonk
Yup, Int64.GetHashCode does exactly that. In Java that would require boxing, of course. That reminds me - time to add a link to the book...
Jon Skeet
Would be a great idea to wrap the arithmetic in an unchecked block to prevent overflow.
spender
@Jon Skeet: I now have this in a `HashUtils` static class:`public static int GetJonSkeetHashCode(params object[] fields)``{` `return fields.Aggregate(17, (hashSoFar, field) => 23 * hashSoFar + (field ?? 0).GetHashCode());``}`
Ani
@Ani: If you're implementing hash codes sufficiently often that that's useful, go for it :)
Jon Skeet
A: 

Most of my work is done with database connectivity which means that my classes all have a unique identifier from the database. I always use the ID from the database to generate the hashcode.

// Unique ID from database
private int _id;

...    
{
  return _id.GetHashCode();
}
Mark G
That means that if you have objects Person and Account and they both have and ID = 1, they will have the same hash code. And that is not ok.
Petar Repac
Actually the comment above is incorrect. There will always be the possibility of hash-code collisions (a hash code only locates the bucket, not the individual object).So such an implementation - for a hashcode containing mixed objects - would lead to a lot of collisions, which is undesirable, but it would be absolutely fine if you only ever had objects of a single type in your hashtables.Also it doesn't distribute evenly, however neither does the base implementation on system.object, so I wouldn't worry about it too much...
piers7
+9  A: 

I have a Hashing class in Helper library that I use it for this purpose.

 /// <summary>
 /// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
 /// Also, some simple optimizations to the algorithm in order to speed up
 /// its hashing process have been added. from: www.partow.net
 /// </summary>
 /// <param name="input">array of objects, parameters combination that you need
 /// to get a unique hash code for them</param>
 /// <returns>Hash code</returns>
 public static int RSHash(params object[] input)
 {
  const int b = 378551;
  int a = 63689;
  int hash = 0;

  // I have added the unchecked keyword to make sure 
  // not get an overflow exception.
  // It can be enhanced later by catching the OverflowException.

  unchecked
  {
   for (int i = 0; i < input.Length; i++)
   {
    if (input[i] != null)
    {
     hash = hash * a + input[i].GetHashCode();
     a = a * b;
    }
   }
  }

  return hash;
 }

Then, simply you can use it as:

public override int GetHashCode()
{
    return Hashing.RSHash(_field1, _field2, _field3);
}

I didn't assess its performance, so any feedback is welcomed.

Waheed Sayed
Looks cool, tnx :-)
Yacoder
Well, it will cause boxing, if fields are value types.
nightcoder
+4  A: 
Bert Huijben
"In most cases where Equals() compares multiple fields it doesn't really matter if your GetHash() hashes on one field or on many." This is dangerous advice, because for objects which only differ in the un-hashed fields, you will get hash collisions. If this happens frequently, the performance of hash-based collections (HashMap, HashSet etc.) will degrade (up to O(n) in the worst case).
sleske
This actually happened in Java: In early versions of the JDK String.hashCode() only considered the beginning of the string; this lead to performance problems if you used Strings as keys in HashMaps which only differed at the end (which is common e.g. for URLs). The algorithm was therefore changed ( in JDK 1.2 or 1.3 I believe).
sleske
If that one field 'provides a good distribution' (last part of my answer), then one field is enough.. If it **doesn't provide a good distribution**, then (and just then) you need another calculation. (E.g. just use another field that **does** provide a good distribution, or use multiple fields)
Bert Huijben
+6  A: 

Here is my hashcode helper.
It's advantage is that it uses generic type arguments and therefore will not cause boxing:

public static class HashHelper
{
    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
        unchecked
        {
            return 31 * arg1.GetHashCode() + arg2.GetHashCode();
        }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            return 31 * hash + arg3.GetHashCode();
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            hash = 31 * hash + arg3.GetHashCode();
            return 31 * hash + arg4.GetHashCode();
        }
    }

    public static int GetHashCode<T>(T[] list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    public static int GetHashCode<T>(IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    /// <summary>
    /// Gets a hashcode for a collection for that the order of items does not matter.
    /// So {1, 2, 3} and {3, 2, 1} will get same hash code.
    /// </summary>
    public static int GetHashCodeForOrderNoMatterCollection<T>(IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            int count = 0;
            foreach (var item in list)
            {
                hash += item.GetHashCode();
                count++;
            }
            return 31 * hash + count.GetHashCode();
        }
    }

    /// <summary>
    /// Alternative way to get a hashcode is to use a fluent interface like this:<br />
    /// return 0.CombineHashCode(field1).CombineHashCode(field2).CombineHashCode(field3);
    /// </summary>
    public static int CombineHashCode<T>(this int hashCode, T arg)
    {
        unchecked
        {
            return 31 * hashCode + arg.GetHashCode();   
        }
    }
}

Also it has extension method to provide a fluent interface, so you can use it like this:

public override int GetHashCode()
{
  return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}

or like this:

public override int GetHashCode()
{
  return 0.CombineHashCode(Manufacturer)
          .CombineHashCode(PartN)
          .CombineHashCode(Quantity);
}
nightcoder
Very handy. Thanks.
Bklyn
+1  A: 

This is a good one:

    /// <summary>
/// Helper class for generating hash codes suitable 
/// for use in hashing algorithms and data structures like a hash table. 
/// </summary>
public static class HashCodeHelper
{
    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="arr">An array of objects used for generating the hash code.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data structures like a hash table. 
    /// </returns>
    public static int GetHashCode(params object[] arr)
    {
        int hash = 0;
        foreach (var item in arr)
            hash = GetHashCodeInternal(hash, item.GetHashCode());
        return hash;

    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <param name="obj4">The fourth object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3, T4 obj4)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
    {
        return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
    }

    private static int GetHashCodeInternal(int key1, int key2)
    {
        int h = key1;
        h += (h << 10);
        h ^= (h >> 6);

        h += key2;
        h += (h << 10);
        h ^= (h >> 6);

        h += (h << 3);
        h ^= (h >> 11);
        h += (h << 15);

        return h;
    }
}

And here is how to use it:

private struct Key
    {
        private Type _type;
        private string _field;

        public Type Type { get { return _type; } }
        public string Field { get { return _field; } }

        public Key(Type type, string field)
        {
            _type = type;
            _field = field;
        }

        public override int GetHashCode()
        {
            return HashCodeHelper.GetHashCode(_field, _type);
        }

        public override bool Equals(object obj)
        {
            if (!(obj is Key))
                return false;
            var tf = (Key)obj;
            return tf._field.Equals(_field) && tf._type.Equals(_type);
        }
    }
Magnus Persson
@Magnus, can you explain *why* it's a good one?
David Rutten
How are the Keys determined? GetHashCode() doesn't take any parameters, so it needs to call this one with two Keys that need to be determined somehow. Sorry, without further explanation this only looks clever, but not that good.
Michael Stum
It't hash code generation part of my hash code helper class.
Magnus Persson
I'll post the complete code.
Magnus Persson
And why do you need the generic overloads? The type is not important (and not used in your code) since *all* objects have a `GetHashCode()` method, so you can always use the method with the `params` array parameter. Or am I missing something here?
gehho
It's about performance, avoid the loop for smaller <= 4 fields. But I guess the generics could be skipped and just use object instead.
Magnus Persson