views:

409

answers:

7

I have an immutable Value Object, IPathwayModule, whose value is defined by:

  • (int) Block;
  • (Entity) Module, identified by (string) ModuleId;
  • (enum) Status; and
  • (entity) Class, identified by (string) ClassId - which may be null.

Here's my current IEqualityComparer implementation which seems to work in a few unit tests. However, I don't think I understand what I'm doing well enough to know whether I am doing it right. A previous implementation would sometimes fail on repeated test runs.

private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
    public bool Equals(IPathwayModule x, IPathwayModule y)
    {
        int hx = GetHashCode(x);
        int hy = GetHashCode(y);
        return hx == hy;
    }

    public int GetHashCode(IPathwayModule obj)
    {
        int h;
        if (obj.Class != null)
        {
            h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + obj.Class.ClassId.GetHashCode();
        }
        else
        {
            h = obj.Block.GetHashCode() + obj.Module.ModuleId.GetHashCode() + obj.Status.GetHashCode() + "NOCLASS".GetHashCode();
        }
        return h;
    }
}

IPathwayModule is definately immutable and different instances with the same values should be equal and produce the same HashCode since they are used as items within HashSets.

I suppose my questions are:

  • Am I using the interface correctly in this case?
  • Are there cases where I might not see the desired behaviour?
  • Is there any way to improve the robustness, performance?
  • Are there any good practices that I am not following?
+2  A: 

You shouldn't use GetHashCode() as the main way of comparison objects. Compare it field-wise.

There could be multiple objects with the same hash code (this is called 'hash code collisions').

Also, be careful when add together multiple integer values, since you can easily cause an OverflowException. Use 'exclusive or' (^) to combine hashcodes or wrap code into 'unchecked' block.

elder_george
+3  A: 

Don't do the Equals in terms of the Hash function's results it's too fragile. Rather do a field value comparison for each of the fields. Something like:

return x != null && y != null && x.Name.Equals(y.Name) && x.Type.Equals(y.Type) ...

Also, the hash functions results aren't really amenable to addition. Try using the ^ operator instead.

return obj.Name.GetHashCode() ^ obj.Type.GetHashCode() ...

You don't need the null check in GetHashCode. If that value is null, you've got bigger problems, no use trying to recover from something over which you have no control...

Pieter Breed
Can you explain your last point a bit further? I need to generate a HashCode which would be based on .Class.ClassId but only if .Class is not null.
Daniel Skinner
I'm coming from a moral authority standpoint here, so feel free to disregard :) If you want to be storing nulls in your collection (in my opinion this is a bad design and this is the point I'm implicitly arguing against) then you will have to do a null check here. But then for all values of null you should return the same value, 0 is prolly easiest.
Pieter Breed
In my domain the null seems to make sense. For example, if a Dog does not have a Home then what's wrong with Dog.Home == null?
Daniel Skinner
Dog.Home may be null, yes, but should the collection of all Homes have a null Home in it? The reason I'm talking about collections is that the IEqualityComparer has a primary use-case of being applied to collections and I think more specifically to untyped collections.
Pieter Breed
In my domain Pathway is a (Hash)Set of IPathwayModule's and a Pathway would never have a null item in it's set. The IEqualityComparer is being used by the HashSet to determine equality of it's items thus preventing duplicates been added to the set.
Daniel Skinner
+1  A: 

You should implement better versions of Equals and GetHashCode.

For instance, the hash code of enums is simply their numerical value.

In other words, with these two enums:

public enum A { x, y, z }
public enum B { k, l, m }

Then with your implementation, the following value type:

public struct AB {
    public A;
    public B;
}

the following two values would be considered equal:

AB ab1 = new AB { A = A.x, B = B.m };
AB ab2 = new AB { A = A.z, B = B.k };

I'm assuming you don't want that.

Also, passing the value types as interfaces will box them, this could have performance concerns, although probably not much. You might consider making the IEqualityComparer implementation take your value types directly.

Lasse V. Karlsen
IPathwayModule and those classes that derive from this are implemented as classes, not structs for performance reasons. Is your performance concern with boxing still valid in this case?
Daniel Skinner
Well, you are saying "value objects". I was thinking "Value types", which implies structs.
Lasse V. Karlsen
I was using "value object" in the DDD sense. It seems to be the norm to implement them as classes provided that equality override to work like a value type.
Daniel Skinner
+2  A: 

The only big problem is the implementation of Equals. Hash codes are not unique, you can get the same hash code for objects which are different. You should compare each field of IPathwayModule individually.

GetHashCode() can be improved a bit. You don't need to call GetHashCode() on an int. The int itself is a good hash code. The same for enum values. Your GetHashCode could be then implemented like this:

public int GetHashCode(IPathwayModule obj)
{
    unchecked {
        int h = obj.Block + obj.Module.ModeleId.GetHashCode() + (int) obj.Status;
        if (obj.class != null)
           h += obj.Class.ClassId.GetHashCode();
        return h;
    }
}

The 'unchecked' block is necessary because there may be overflows in the arithmetic operations.

Lluis Sanchez
+1  A: 

If I understand you well, you'd like to hear some comments on your code. Here're my remarks:

  1. GetHashCode should be XOR'ed together, not added. XOR (^) gives a better chance of preventing collisions
  2. You compare hashcodes. That's good, but only do this if the underlying object overrides the GetHashCode. If not, use properties and their hashcodes and combine them.
  3. Hash codes are important, they make a quick compare possible. But if hash codes are equal, the object can still be different. This happens rarely. But you'll need to compare the fields of your object if hash codes are equal.
  4. You say your value types are immutable, but you reference objects (.Class), which are not immutable
  5. Always optimize comparison by adding reference comparison as first test. References unequal, the objects are unequal, then the structs are unequal.

Point 5 depends on whether the you want the objects that you reference in your value type to return not equal when not the same reference.

EDIT: you compare many strings. The string comparison is optimized in C#. You can, as others suggested, better use == with them in your comparison. For the GetHashCode, use OR ^ as suggested by others as well.

Abel
4. IPathwayModule is implmented as a class (reference type) for performance reasons. Yes, the value of IPathwayModule does depend on reference objects (e.g. Class, Module) but it depdends on the identity field of these entities which can be assumed to be unchanging.5. Will do. Since IPathwayModule is implemented as a class, ReferenceEquals would immediately imply equality. In addition, two different references with the same value should be .Equals() and will ideally share the same HashCode to ensure any HashSet<IPathwayModule> does not contain duplicates.
Daniel Skinner
No, don't OR hash codes. Just a few of them and all your hash codes will be 0xffffffff. Use XOR
erikkallen
that's what I meant, thanks for pointing it out. See my edit. I'll make it clearer.
Abel
+1  A: 
  1. Assuming that two objects are equal because their hash code is equal is wrong. You need to compare all members individually
  2. It is proabably better to use ^ rather than + to combine the hash codes.
erikkallen
A: 

Thanks to all who responded. I have aggregated the feedback from everyone who responded and my improved IEqualityComparer now looks like:

private class StandardPathwayModuleComparer : IEqualityComparer<IPathwayModule>
{
    public bool Equals(IPathwayModule x, IPathwayModule y)
    {
        if (x == y) return true;
        if (x == null || y == null) return false;

        if ((x.Class == null) ^ (y.Class == null)) return false;

        if (x.Class == null) //and implicitly y.Class == null
        {
            return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId);
        }
        return x.Block.Equals(y.Block) && x.Status.Equals(y.Status) && x.Module.ModuleId.Equals(y.Module.ModuleId) && x.Class.ClassId.Equals(y.Class.ClassId);
    }
    public int GetHashCode(IPathwayModule obj)
    {
        unchecked {
            int h = obj.Block ^ obj.Module.ModuleId.GetHashCode() ^ (int) obj.Status;
            if (obj.Class != null)
            {
               h ^= obj.Class.ClassId.GetHashCode();
            }
            return h;
        }
    }
}
Daniel Skinner
You can rewrite your `x/y.Class == null` test as `(x.Class ?? y.Class) == null` which can be slightly briefer and clearer (`??` is the null-coalescing operator and returns left operand if it's not null, otherwise right operand).
Abel
In addition, there's an error: you test for `x.Class` first, which will throw an exception if `x` is null. Later you test for `x == null`. This code will never be reached if `x` is indeed `null`. Before you do anything at all, test for `null`: if either is `null`, they are unequal. This also rewrites your `ReferenceEquals` test: `if(x == y) return true; elseif(x == null || y == null) return false;`
Abel
But the case where x.Class is not null and y.Class is null is not captured by this expression.
Daniel Skinner
Thanks Abel, I have made the change in your second comment.
Daniel Skinner