views:

405

answers:

4

I've got multiple classes that, for certain reasons, do not follow the official Equals contract. In the overwritten GetHashCode() these classes simply return 0 so they can be used in a Hashmap.

Some of these classes implement the same interface and there are Hashmaps using this interface as key. So I figured that every class should at least return a different (but still constant) value in GetHashCode().

The question is how to select this value. Should I simply let the first class return 1, the next class 2 and so on? Or should I try something like

class SomeClass : SomeInterface {
    public overwrite int GetHashCode() {
        return "SomeClass".GetHashCode();
    }
}

so the hash is distributed more evenly? (Do I have to cache the returned value myself or is Microsoft's compiler able to optimize this?)

Update: It is not possible to return an individual hashcode for each object, because Equals violates the contract. Specifially, I'm refering to this problem.

+2  A: 

I'm curious what the reasoning would be for overriding GetHashCode() and returning a constant value. Why violate the idea of a hash rather than just violating the "contract" and not overriding the GetHashCode() function at all and leave the default implementation from Object?

Edit

If what you've done is that so you can have your objects match based on their contents rather than their reference then what you propose with having different classes simply use different constants can WORK, but is highly inefficient. What you want to do is come up with a hashing algorithm that can take the contents of your class and produce a value that balances speed with even distribution (that's hashing 101).

I guess I'm not sure what you're looking for...there isn't a "good" scheme for choosing constant numbers for this paradigm. One is not any better than the other. Try to improve your objects so that you're creating a real hash.

Adam Robinson
Leaving the default implementation could result in "equal" objects with different hashes, if I'm not mistaken. This would really mess up my program, so I'd rather not.
mafutrct
+2  A: 

If it "violates the Equals contract", then I'm not sure you should be using it as a key.

It something is using that as a key, you really need to get the hashing right... it is very unclear what the Equals logic is, but two values that are considered equal must have the same hash-code. It is not required that two values with the same hash-code are equal.

Using a constant string won't really help much - you'll get the values split evenly over the types, but that is about it...

Marc Gravell
You are absolutely right in that it is a bad idea to use it as a key. However, in my specific case, it works, as explained in the comment to Strilanc's post. - Yes, it won't help much, but I figured it might still be better than nothing... :-\
mafutrct
+1  A: 

When hash collisions occur, the HashTable/Dictionary calls Equals to find the key you're looking for. Using a constant hash code removes the speed advantages of using a hash in the first place - it becomes a linear search.

You're saying the Equals method hasn't been implemented according to the contract. What exactly do you mean with this? Depending on the kind of violation, the HashTable or Dictionary will merely be slow (linear search) or not work at all.

Zr40
Indeed, it is likely that many of the objects share the same hash. However, for certain reasons, this is not always the case, and a dictionary lookup should (did not measure) still have a slight advantage.
mafutrct
+1  A: 

I ran into this exact problem when writing a vector class. I wanted to compare vectors for equality, but float operations give rounding errors, so I wanted approximate equality. Long story short, overriding equals is a bad idea unless your implementation is symmetric, reflexive, and transitive.

Other classes are going to assume equals has those properties, and so will classes using those classes, and so you can end up in weird cases. For example a list might enforce uniqueness, but end up with two elements which evaluate as equal to some element B.

A hash table is the perfect example of unpredictable behavior when you break equality. For example:

//Assume a == b, b == c, but a != c
var T = new Dictionary<YourType, int>()
T[a] = 0
T[c] = 1
return T[b] //0 or 1? who knows!

Another example would be a Set:

//Assume a == b, b == c, but a != c
var T = new HashSet<YourType>()
T.Add(a)
T.Add(c)
if (T.contains(b)) then T.remove(b)
//surely T can't contain b anymore! I sure hope no one breaks the properties of equality!
if (T.contains(b)) then throw new Exception()

I suggest using another method, with a name like ApproxEquals. You might also consider overriding the == operator, because it isn't virtual and therefore won't be used accidentally by other classes like Equals could be.

If you really can't use reference equality for the hash table, don't ruin the performance of cases where you can. Add an IApproxEquals interface, implement it in your class, and add an extension method GetApprox to Dictionary which enumerates the keys looking for an approximately equal one, and returns the associated value. You could also write a custom dictionary especially for 3-dimensional vectors, or whatever you need.

Strilanc
- If you don't mind, since we seem to have the same problem, could you possibly tell me more about your story?- I carefully designed my code to be aware of the possibility of coincidence of vectors. So far, everything works fine.- I'm not sure if I understand your last paragraph correctly. I see your idea to leave Equals as inherited by Object and add a custom ApproxEquals. But what do you mean with "requires knowing the type at compile-time"?
mafutrct
I meant that Equals is virtual, and == is not. This has consequences in how they are used. The important one here is that other classes won't accidentally use your weird == operator, but they could use your weird Equals.In my project I just implemented the == operator, and so far it has worked well. Whenever I use lists or tables I've found that reference equality is what I want, or at least equivalent to what I want.
Strilanc
Regarding the Hashtable example: I currently consider this uncertainty part of the design, as sick as this may sound. There are other exploits that cannot be fixed (adding an "equal", but different value to a hashtable many times can result in a huge deviation of the original value). These cases do not occur in the program, luckily. I'll have a very careful look at your ApproxEquals idea, though.
mafutrct
You shouldn't ruin the cases where you will have reference equality just so you can use a hash table with the ones where you don't. Write a hash table designed for vectors, or add a method to dictionary to deal with approx equals.
Strilanc