views:

367

answers:

2

Basically, I have the following so far:

class Foo {
    public override bool Equals(object obj)
    {
        Foo d = obj as Foo ;
        if (d == null)
            return false;

        return this.Equals(d);
    }

    #region IEquatable<Foo> Members

    public bool Equals(Foo other)
    {
        if (this.Guid != String.Empty && this.Guid == other.Guid)
            return true;
        else if (this.Guid != String.Empty || other.Guid != String.Empty)
            return false;

        if (this.Title == other.Title &&
            this.PublishDate == other.PublishDate &&
            this.Description == other.Description)
            return true;

        return false;
    }
}

So, the problem is this: I have a non-required field Guid, which is a unique identifier. If this isn't set, then I need to try to determine equality based on less accurate metrics as an attempt at determining if two objects are equal. This works fine, but it make GetHashCode() messy... How should I go about it? A naive implementation would be something like:

public override int GetHashCode() {
    if (this.Guid != String.Empty)
        return this.Guid.GetHashCode();

    int hash = 37;
    hash = hash * 23 + this.Title.GetHashCode();
    hash = hash * 23 + this.PublishDate.GetHashCode();
    hash = hash * 23 + this.Description.GetHashCode();
    return hash;
}

But what are the chances of the two types of hash colliding? Certainly, I wouldn't expect it to be 1 in 2 ** 32. Is this a bad idea, and if so, how should I be doing it?

+4  A: 

A very easy hash code method for custom classes is to bitwise XOR each of the fields' hash codes together. It can be as simple as this:

int hash = 0;
hash ^= this.Title.GetHashCode();
hash ^= this.PublishDate.GetHashCode();
hash ^= this.Description.GetHashCode();
return hash;

From the link above:

XOR has the following nice properties:

  • It does not depend on order of computation.
  • It does not “waste” bits. If you change even one bit in one of the components, the final value will change.
  • It is quick, a single cycle on even the most primitive computer.
  • It preserves uniform distribution. If the two pieces you combine are uniformly distributed so will the combination be. In other words, it does not tend to collapse the range of the digest into a narrower band.

XOR doesn't work well if you expect to have duplicate values in your fields as duplicate values will cancel each other out when XORed. Since you're hashing together three unrelated fields that should not be a problem in this case.

John Kugelman
XOR not depending on the order of computation is a two-edged sword... if you have objects with multiple fields of the same type (for example, two dates), then when these are swapped around the objects will 'look the same' to the hash.
jerryjvl
+1  A: 

I don't think there is a problem with the approach you have chosen to use. Worrying 'too much' about hash collisions is almost always an indication of over-thinking the problem; as long as the hash is highly likely to be different you should be fine.

Ultimately you may even want to consider leaving out the Description from your hash anyway if it is reasonable to expect that most of the time objects can be distinguished based on their title and publication date (books?).

You could even consider disregarding the GUID in your hash function altogether, and only use it in the Equals implementation to disambiguate the unlikely(?) case of hash clashes.

jerryjvl
Altho, obviously, the GUID if present, is likely to hash a lot quicker than an arbitrary title string... so it could be a feasible performance optimisation.
jerryjvl
Description needs to be included in equality (and hence in the hash code)
Matthew Scharley
Oh, and for the record, RSS items.
Matthew Scharley
Note that it is perfectly valid for 'Description' to be in 'Equals', but not in 'GetHashCode' ... Your hashing function is allowed to be weaker than the equality test, as long as the hashes of equal items can never be different
jerryjvl