views:

764

answers:

7

After reading all the questions and answers on StackOverflow concerning overriding GetHashCode() I wrote the following extension method for easy and convenient overriding of GetHashCode():

public static class ObjectExtensions
{
    private const int _seedPrimeNumber = 691;
    private const int _fieldPrimeNumber = 397;
    public static int GetHashCodeFromFields(this object obj, params object[] fields) {
        unchecked { //unchecked to prevent throwing overflow exception
            int hashCode = _seedPrimeNumber;
            for (int i = 0; i < fields.Length; i++)
                if (fields[i] != null)
                    hashCode *= _fieldPrimeNumber + fields[i].GetHashCode();
            return hashCode;
        }
    }
}

(I basically only refactored the code that someone posted there, because I really like that it can be used generally)

which I use like this:

    public override int GetHashCode() {
        return this.GetHashCodeFromFields(field1, field2, field3);
    }

Do you see any problems with this code?

+2  A: 

I looks pretty good to me, I only have one issue: It is a shame that you have to use an object[] to pass in the values as this will box any value types you send to the function. I don't think you have much of a choice though, unless you go the route of creating some generic overloads like others have suggested.

Andrew Hare
IIRC a virtual function call on a valuetype will box the value.
leppie
only if the type does NOT supply an implementation for the method
ShuggyCoUk
A: 

On general principle you should scope your unchecked as narrowly as you reasonably can, though it doesn't matter much here. Other than that, looks fine.

chaos
+1  A: 
public override int GetHashCode() {
    return this.GetHashCodeFromFields(field1, field2, field3, this);
}

(yes, I'm very pedantic but this is the only problem that I see)

dfa
Wouldn't this cause an infinite loop?
Hermann
sure, I don't even know how to fix that
dfa
if you're trying to protect your generic function against a stupid developer, I guess you could check that fields[i] != obj, but this is one case where I'd just let it crash and burn if the user is that stupid....
Jonathan
you cannot fix the human stupidity... in C# :) :)
dfa
+1  A: 

That looks like a solid way to do it.

My only suggestion is that if you're really concerned about performance with it, you may want to add generic versions for several common cases (ie. probably 1-4 args). That way, for those objects (which are most likely to be small, key-style composite objects), you won't have the overhead of building the array to pass to the method, the loop, any boxing of generic values, etc. The call syntax will be exactly the same, but you'll run slightly more optimized code for that case. Of course, I'd run some perf tests over this before you decide whether it's worth the maintenance trade-off.

Something like this:

public static int GetHashCodeFromFields<T1,T2,T3,T4>(this object obj, T1 obj1, T2 obj2, T3 obj3, T4 obj4) {
    int hashCode = _seedPrimeNumber;
    if(obj1 != null)
        hashCode *= _fieldPrimeNumber + obj1.GetHashCode();
    if(obj2 != null)
        hashCode *= _fieldPrimeNumber + obj2.GetHashCode();
    if(obj3 != null)
        hashCode *= _fieldPrimeNumber + obj3.GetHashCode();
    if(obj4 != null)
        hashCode *= _fieldPrimeNumber + obj4.GetHashCode();
    return hashCode;
}
Jonathan
Thanks Jonathan, I think this would solve all the performance problems that you and others have pointed out. I also think the code generation solution was great but I like this more.
Hermann
+1  A: 

More optimal:

  1. Create a code generator that uses reflection to look through your business object fields and creates a new partial class which overrides GetHashCode() (and Equals()).
  2. Run the code generator when your program starts up in debug mode, and if the code has changed, exit with a message to the developer to recompile.

The advantages of this are:

  • Using reflection you know which fields are value types or not, and hence whether they need null checks.
  • There are no overheads - no extra function calls, no list construction, etc. This is important if you are doing lots of dictionary lookups.
  • Long implementations (in classes with lots of fields) are hidden in partial classes, away from your important business code.

Disadvantages:

  • Overkill if you don't do lots of dictionary lookups/calls to GetHashCode().
+1  A: 

I should point out that you should almost never do allocation while implementing GetHashCode (here's some useful blog posts about it).

The way that params works (generating a new array on the fly) means this is really not a good general solution. You would be better using a method call per field and maintaiing the hash state as a variable passed to them (this makes it easy to use better hashing functions and avalanching too).

ShuggyCoUk
Thank you for the links Shuggy. Very interesting. I didn't know that the GetHashCode implementations of the WPF Pen, Font, Color and Brush were so 'heavy'. I am using these a lot but fortunately not as keys in my dictionaries.
Hermann
They may have fixed this since that post...I have a few nifty functions for chaining hashes together (using Jenkins one at a time hash) I'll try to pop it in later
ShuggyCoUk
+2  A: 

I wrote some stuff a little while back that you might solve your problem... (And actually, it could probably be improved to include the seed that you have...)

Anyway, the project is called Essence ( http://essence.codeplex.com/ ), and it uses the System.Linq.Expression libraries to generate (based on attributes) standard representations of Equals/GetHashCode/CompareTo/ToString, as well as being able to create IEqualityComparer and IComparer classes based on an argument list. (I also have some further ideas, but would like to get some community feedback before continuing too much further.)

(What this means is that it's almost as fast as being handwritten - the main one where it isn't is the CompareTo(); cause the Linq.Expressions doesn't have the concept of a variable in the 3.5 release - so you have to call CompareTo() on the underlying object twice when you don't get a match. Using the DLR extensions to Linq.Expressions solves this. I suppose I could have used the emit il, but I wasn't that inspired at the time.)

It's quite a simple idea, but I haven't seen it done before.

Now the thing is, I kind of lost interest in polishing it (which would have included writing an article for codeproject, documenting some of the code, or the like), but I might be persuaded to do so if you feel it would be something of interest.

(The codeplex site doesn't have a downloadable package; just go to the source and grab that - oh, it's written in f# (although all the test code is in c#) as that was the thing I was interested in learning.)

Anyway, here is are c# example from the test in the project:

    // --------------------------------------------------------------------
    // USING THE ESSENCE LIBRARY:
    // --------------------------------------------------------------------
    [EssenceClass(UseIn = EssenceFunctions.All)]
    public class TestEssence : IEquatable<TestEssence>, IComparable<TestEssence>
    {
        [Essence(Order=0] public int MyInt           { get; set; }
        [Essence(Order=1] public string MyString     { get; set; }
        [Essence(Order=2] public DateTime MyDateTime { get; set; }

        public override int GetHashCode()                                { return Essence<TestEssence>.GetHashCodeStatic(this); }
    ...
    }

    // --------------------------------------------------------------------
    // EQUIVALENT HAND WRITTEN CODE:
    // --------------------------------------------------------------------
    public class TestManual
    {
        public int MyInt;
        public string MyString;
        public DateTime MyDateTime;

        public override int GetHashCode()
        {
            var x = MyInt.GetHashCode();
            x *= Essence<TestEssence>.HashCodeMultiplier;
            x ^= (MyString == null) ? 0 : MyString.GetHashCode();
            x *= Essence<TestEssence>.HashCodeMultiplier;
            x ^= MyDateTime.GetHashCode();
            return x;
        }
    ...
    }

Anyway, the project, if anyone thinks is worthwhile, needs polishing, but the ideas are there...

Paul Westcott
Nice, but definitely much slower than hand-written impl.
Alex Yakunin
No, not really.Go to http://essence.codeplex.com/ and give it a burl!
Paul Westcott