views:

90

answers:

3

In one of my aplications I have to use many dictonarys with custom objects as keys. To improve the performance of the lookups I implemetet an base class that overrites GetHashCode. It seams to work but somehow I still have a bad fealing about it so I decided to post my code and I would be gratefull for any tips or coments. (omg I forgot the code :D )

abstract class FastHashed
{
    private static Dictionary<Type,ulong> _instanceCounters = new Dictionary<Type,ulong>();

    private int hash;

    protected FastHashed()
    {
        Type instanceType = this.GetType();
        if(! _instanceCounters.ContainsKey(instanceType)) _instanceCounters.Add(instanceType,0);  
        this.hash = ((instanceType.ToString())+(_instanceCounters[instanceType]++.ToString())).GetHashCode();
    }

    public override int  GetHashCode()
    {
         return hash;
    }
}

Edit: Do not mess with the hashing if you do not have to. This "sollution" is slower and less reliable then the default GetHashCode().

Edit: I did some performance testing with the Equatec profiler and a simple console aplication.

class Program { static readonly int cycles = 50000; static Dictionary objectsDict = new Dictionary(); static Dictionary foosDict = new Dictionary();

    static void Main(string[] args)
    {

        foo[] foos = new foo[cycles];
        object[] objects = new object[cycles];


        for (int i = 0; i < cycles; i++)
        {
            foos[i] = new foo();
            objects[i] = new object();
            foosDict.Add(foos[i], i);
            objectsDict.Add(objects[i], i);
        }

        ObjectHash(objects);
        FooHash(foos);

    }

    static void ObjectHash(Object[] objects)
    {
        int value; 
        for (int i = 0; i < cycles; i++)
        {
            value = objectsDict[objects[i]];
        }

    }
    static void FooHash(foo[] foos)
    {
        int value;
        for (int i = 0; i < cycles; i++)
        {
            value = foosDict[foos[i]];
        }
    }

    class foo
    {
        private readonly int _hash;
        public foo()
        {
            _hash = this.GetHashCode();
        }
        public override int GetHashCode()
        {
            return _hash;
        }
    }
}

The results: - FooHash 26 774 ms - ObjectHash 7 ms

Obviously the defualt GetHashCode is the best choice.

+2  A: 
  1. This is not thread-safe.
  2. If you only care about reference equality, why do you have different counters for different types?

If all you want is to prevent Hashes from being computed multiple times, why not something like this (or a variant with generics if the dictionary will only hold objects of a certain type):

  public class ObjectWithCachedHashCode : IEquatable<ObjectWithCachedHashCode>
    {
        private int _cachedHashCode;

        public object Object { get; private set; }

        public ObjectWithCachedHashCode(object obj)
        {
            Object = obj;
            _cachedHashCode = obj.GetHashCode();
        }

        public override int GetHashCode()
        {
            return _cachedHashCode;
        }

        public bool Equals(ObjectWithCachedHashCode other)
        {
            return other!=null && Object.Equals(other.Object);
        }

        public override bool Equals(object other)
        {
            return Equals(other as ObjectWithCachedHashCode);
        }


    }

Edit: Made class compatible with Dictionary

Ani
OK the thread-safety kills this concept. If I have to synchronise the constructor calls this will kill my performance, because I have to genrate lots of objects in multiple threads. Are you sure obj.GetHashCode() will return an unique value? The idea behind using the typename and an individual counter was to ensure that the hash code is unique in different derived classes.
Oliver
1. GetHashCode() does not have to be unique, most key-value structures see it as a necessary but insufficient condition for equality. E.g., the Dictionary<K,V> organizes values into hashbuckets, but checks for full-equality within each bucket during retrievals.2. If at all possible, try overriding GetHashCode() sensibly in the derived classes. If this is not an option, stick with Object.GetHashCode(), since you only care about equality by reference anyway. This is bound to be implemented in a way that reduces collisions, although it is not speced, and implementaions may change in the future.
Ani
@Ani: Your Cached hash code will onnly be used by those using the HashCode property (which is no one). Everyone who calls GetHashCode (which is everyone) will still get the computed every time hash code.
James Curran
Yes, that's right. I've updated it to be standardized.Also note that I have delibrately not handled the case where Object is null; the class should be modified to what the user considers to be the right behaviour in this case.
Ani
Thanks a lot. In my case I´ll stick with caching the Object.GetHashCodebecause I can not derive an unique value from the object propertys. Additonal pro: this implementation is much simpler and I can easily use it on classes derived from other classes and use the parents hash.
Oliver
No problem. :) What I am interested in is whether you are certain the hash computations are indeed the performance bottleneck, and why that is so.
Ani
The aplication is doing lots of lookups so Ithought I´ll search the web for methods to speedup the lookup. Somewher I stumblep upon the sugestion to use an cached int as hash and increment it on each constructor call. This seamed to be not very save so I tryed creating an unique string and use its hash instead.
Oliver
+2  A: 

You can mark the hash variable readonly.

But to be honest, in C# where you have single inheritance it is not always wise to "waste" the inheritance to implement such specific behavior. Suppose you suddenly wants to inherit from a base class that "does" something. Save class inheritance to modelling purposes, not implementing details.

Albin Sunnanbo
In most cases your absolutly right.But I´m using this base class for some classes that inherit from object and severall interfaces, so wasting my inheritance is not a problem.
Oliver
I read an article and a discussion about how to speed up the lookup. Seas however wrote it had no Idea what he was doing ;)
Oliver
A: 

As far as I can see, this is just functionally equivalent to the object.GetHashCode() default implemntation, apart from being slower and non thread-safe. What is it that makes "Fast Hash" fast?

Jon Hanna
I agree that is not thread safe. But why is it slower? Does Object.GetHashCode cash its result? I found prlenty of threads that suggest to ceate a private Hash property and return it, instead of creating a new hash on each call. BUt as I mentioned in my question I do have a bad fealing about this.
Oliver
object.GetHashCode() has a quickly obtained result based on initial location.This has an implementation based on a lookup of a hash of objects that use that same object.GetHashCode() in construction, to seed a different quick-obtained result.All you've done is add to the time of creation (especially when you fix it to real code that handles multi-threading, which in itself makes a object creation - normally thread-safe - a potential point of contention with every other object derived from this type).That locking will make it potentially much slower (creation time infinite in worse case)
Jon Hanna
For that matter, since you are hashing on identity rather than on value (each object has a different hash-code even if all properties are identical), and since they are your own objects, could you perhaps add the "values" to the "key" objects as properties, rather than hashing on them? Each object will key to a unique value anyway, so rather than looking up based on a hash, why not just access that property?
Jon Hanna
Thank you for your detailed answer. I wrote this because I read something very different about GerHashCode and its performance. What you say makes a lot more senses.
Oliver
Since your hash production is - considered as a hash - no better than `object`'s and your only concern is speed of returning the hash, you could just cache the default result: `protected FastHashed(){hash = base.GetHashCode();}`I'm not sure I'd bother, though I've certainly memoised `GetHashCode()` on value types, but it would be worth experimenting with (call GetHashCode for the same object thousands of times).
Jon Hanna