In .NET System.Object.GetHashCode
method is use in a lot of places throughout the .NET base class libraries. Especially when finding items in a collection fast or to determine equality. Is there a standard algorithm/ best practise on how to implement the GetHashCode
override for my custom classes so I don't degrade performance?
views:
4390answers:
6I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:
public int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + field1.GetHashCode();
hash = hash * 23 + field2.GetHashCode();
hash = hash * 23 + field3.GetHashCode();
return hash;
}
}
This is better than the common practice of XOR
ing hashcodes for two main reasons. Suppose we have a type with two int
fields:
XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y
By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.
EDIT: This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right.
Most of my work is done with database connectivity which means that my classes all have a unique identifier from the database. I always use the ID from the database to generate the hashcode.
// Unique ID from database
private int _id;
...
{
return _id.GetHashCode();
}
I have a Hashing class in Helper library that I use it for this purpose.
/// <summary>
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
const int b = 378551;
int a = 63689;
int hash = 0;
// I have added the unchecked keyword to make sure
// not get an overflow exception.
// It can be enhanced later by catching the OverflowException.
unchecked
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] != null)
{
hash = hash * a + input[i].GetHashCode();
a = a * b;
}
}
}
return hash;
}
Then, simply you can use it as:
public override int GetHashCode()
{
return Hashing.RSHash(_field1, _field2, _field3);
}
I didn't assess its performance, so any feedback is welcomed.
Here is my hashcode helper.
It's advantage is that it uses generic type arguments and therefore will not cause boxing:
public static class HashHelper
{
public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
{
unchecked
{
return 31 * arg1.GetHashCode() + arg2.GetHashCode();
}
}
public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
{
unchecked
{
int hash = arg1.GetHashCode();
hash = 31 * hash + arg2.GetHashCode();
return 31 * hash + arg3.GetHashCode();
}
}
public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
{
unchecked
{
int hash = arg1.GetHashCode();
hash = 31 * hash + arg2.GetHashCode();
hash = 31 * hash + arg3.GetHashCode();
return 31 * hash + arg4.GetHashCode();
}
}
public static int GetHashCode<T>(T[] list)
{
unchecked
{
int hash = 0;
foreach (var item in list)
{
hash = 31 * hash + item.GetHashCode();
}
return hash;
}
}
public static int GetHashCode<T>(IEnumerable<T> list)
{
unchecked
{
int hash = 0;
foreach (var item in list)
{
hash = 31 * hash + item.GetHashCode();
}
return hash;
}
}
/// <summary>
/// Gets a hashcode for a collection for that the order of items does not matter.
/// So {1, 2, 3} and {3, 2, 1} will get same hash code.
/// </summary>
public static int GetHashCodeForOrderNoMatterCollection<T>(IEnumerable<T> list)
{
unchecked
{
int hash = 0;
int count = 0;
foreach (var item in list)
{
hash += item.GetHashCode();
count++;
}
return 31 * hash + count.GetHashCode();
}
}
/// <summary>
/// Alternative way to get a hashcode is to use a fluent interface like this:<br />
/// return 0.CombineHashCode(field1).CombineHashCode(field2).CombineHashCode(field3);
/// </summary>
public static int CombineHashCode<T>(this int hashCode, T arg)
{
unchecked
{
return 31 * hashCode + arg.GetHashCode();
}
}
}
Also it has extension method to provide a fluent interface, so you can use it like this:
public override int GetHashCode()
{
return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}
or like this:
public override int GetHashCode()
{
return 0.CombineHashCode(Manufacturer)
.CombineHashCode(PartN)
.CombineHashCode(Quantity);
}
This is a good one:
/// <summary>
/// Helper class for generating hash codes suitable
/// for use in hashing algorithms and data structures like a hash table.
/// </summary>
public static class HashCodeHelper
{
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="arr">An array of objects used for generating the hash code.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data structures like a hash table.
/// </returns>
public static int GetHashCode(params object[] arr)
{
int hash = 0;
foreach (var item in arr)
hash = GetHashCodeInternal(hash, item.GetHashCode());
return hash;
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="obj1">The first object.</param>
/// <param name="obj2">The second object.</param>
/// <param name="obj3">The third object.</param>
/// <param name="obj4">The fourth object.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data structures like a hash table.
/// </returns>
public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3, T4 obj4)
{
return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="obj1">The first object.</param>
/// <param name="obj2">The second object.</param>
/// <param name="obj3">The third object.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data structures like a hash table.
/// </returns>
public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
{
return GetHashCode(obj1, GetHashCode(obj2, obj3));
}
/// <summary>
/// Returns a hash code for the specified objects
/// </summary>
/// <param name="obj1">The first object.</param>
/// <param name="obj2">The second object.</param>
/// <returns>
/// A hash code, suitable for use in hashing algorithms and data structures like a hash table.
/// </returns>
public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
{
return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
}
private static int GetHashCodeInternal(int key1, int key2)
{
int h = key1;
h += (h << 10);
h ^= (h >> 6);
h += key2;
h += (h << 10);
h ^= (h >> 6);
h += (h << 3);
h ^= (h >> 11);
h += (h << 15);
return h;
}
}
And here is how to use it:
private struct Key
{
private Type _type;
private string _field;
public Type Type { get { return _type; } }
public string Field { get { return _field; } }
public Key(Type type, string field)
{
_type = type;
_field = field;
}
public override int GetHashCode()
{
return HashCodeHelper.GetHashCode(_field, _type);
}
public override bool Equals(object obj)
{
if (!(obj is Key))
return false;
var tf = (Key)obj;
return tf._field.Equals(_field) && tf._type.Equals(_type);
}
}