tags:

views:

2678

answers:

12

I have a structure in C#:

public struct UserInfo
{
   public string str1
   {
     get;
     set;
   }

   public string str2
   {
     get;
     set;
   }   
}

The only rule is that UserInfo(str1="AA", str2="BB").Equals(UserInfo(str1="BB", str2="AA"))

How to override the GetHashCode function for this structure?

A: 

Many possibilities. E.g.

return str1.GetHashCode() ^ str1.GetHashCode()

VolkerK
A: 

Perhaps something like str1.GetHashCode() + str2.GetHashCode()? or (str1.GetHashCode() + str2.GetHashCode()) / 2? This way it would be the same regardless of whether str1 and str2 are swapped....

Mike Stone
A: 

Try out this one:

(((long)str1.GetHashCode()) + ((long)str2.GetHashCode())).GetHashCode()
Artem Tikhomirov
A: 

Sort them, then concatenate them:

return ((str1.CompareTo(str2) < 1) ? str1 + str2 : str2 + str1)
    .GetHashCode();
Steve Morgan
This will cause your GetHashCode method to do quite a lot of work. Hash codes are intended to be quick. From MSDN: "A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object". Allocating a new string seems like a bad idea inside a hash function.
Wilka
A: 

GetHashCode's result is supposed to be:

  1. As fast as possible.
  2. As unique as possible.

Bearing those in mind, I would go with something like this:

if (str1 == null)
    if (str2 == null)
        return 0;
    else
       return str2.GetHashCode();
else
    if (str2 == null)
        return str1.GetHashCode();
    else
       return ((ulong)str1.GetHashCode() | ((ulong)str2.GetHashCode() << 32)).GetHashCode();

Edit: Forgot the nulls. Code fixed.

Omer van Kloeten
The only rule is that UserInfo(str1="AA", str2="BB").Equals(UserInfo(str1="BB", str2="AA"))
alfred barthand
+22  A: 

MSDN:

A hash function must have the following properties:

  • If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.
  • The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.
  • For the best performance, a hash function must generate a random distribution for all input.

Taking it into account correct way is:

return str1.GetHashCode() ^ str2.GetHashCode() 

^ can be substituted with other commutative operation

aku
Shouldn't that be return str1.GetHashCode() ^ str2.GetHashCode();
roomaroo
Also, doesn't consider null values.
Omer van Kloeten
roomaroo, thanks for correction. of course it's str1 ^ str2
aku
Omer van Kloeten, is should be obvious to any .net developer. quick sample intended to show general idea, not complete solution
aku
If you expect having str1,str2 and str2,str1 in your hash to be a very frequent occurance , lookup speed might be a bit slower than it should be. Lookup speed can also be increased by caching the hashcode. Obviously these may be premature optimizations.
Brian
+1 for pointing out the importance of using a commutative operation
Pandincus
what we can say for this string s1 = "hello"; string s2 = "hello"; (s1 == s2).Dump(); (s1.GetHashCode() == s2.GetHashCode()).Dump();
Tumbleweed
A: 

Ah yes, as Gary Shutler pointed out:

return str1.GetHashCode() + str2.GetHashCode();

Can overflow. You could try casting to long as Artem suggested, or you could surround the statement in the unchecked keyword:

return unchecked(str1.GetHashCode() + str2.GetHashCode());
Groky
A: 

Too complicated, and forgets nulls, etc. This is used for things like bucketing, so you can get away with something like

if(null !=str1) { return str1.GetHashCode(); } if(null !=str2) { return str2.GetHashCode(); } //Not sure what you would put here, some constant value will do return 0;

This is biased by assuming that str1 is not likely to be common in an unusually large proportion of instances.

Roger Willcocks
+1  A: 
public override int GetHashCode()   
{       
    unchecked      
    {           
        return(str1 != null ? str1.GetHashCode() : 0) ^ (str2 != null ? str2.GetHashCode() : 0);       
    }   
}
Why unchecked? xor can't overflow.
Konrad Rudolph
+3  A: 
public override int GetHashCode()
{
    unchecked
    {
        return (str1 ?? String.Empty).GetHashCode() +
            (str2 ?? String.Empty).GetHashCode();
    }
}

Using the '+' operator might be better than using '^', because although you explicitly want ('AA', 'BB') and ('BB', 'AA') to explicitly be the same, you may not want ('AA', 'AA') and ('BB', 'BB') to be the same (or all equal pairs for that matter).

The 'as fast as possible' rule is not entirely adhered to in this solution because in the case of nulls this performs a 'GetHashCode()' on the empty string rather than immediately return a known constant, but even without explicitly measuring I am willing to hazard a guess that the difference wouldn't be big enough to worry about unless you expect a lot of nulls.

+4  A: 
  1. As a general rule, a simple way to generate a hashcode for a class is to XOR all the data fields that can participate in generating the hash code (being careful to check for null as pointed out by others). This also meets the (artificial?) requirement that the hashcodes for UserInfo("AA", "BB") and UserInfo("BB", "AA") are the same.

  2. If you can make assumptions about the use of your class, you can perhaps improve your hash function. For example, if it is common for str1 and str2 to be the same, XOR may not be a good choice. But if str1 and str2 represent, say, first and last name, XOR is probably a good choice.

Although this is clearly not meant to be a real-world example, it may be worth pointing out that: - This is probably a poor example of use of a struct: A struct should normally have value semantics, which doesn't seem to be the case here. - Using properties with setters to generate a hash code is also asking for trouble.

Joe
Hmm, why do you think his struct doesn't have value semantics? And could you expand on your last sentence?
Stefan Monov
+3  A: 

See John Skeet's answer - binary operations like ^ are not good, they will often generate colliding hash!

Tomáš Kafka