tags:

views:

394

answers:

4

What is the algorithm used by the memberwise equality test in .NET structs? I would like to know this so that I can use it as the basis for my own algorithm.

I am trying to write a recursive memberwise equality test for arbitrary objects (in C#) for testing the logical equality of DTOs. This is considerably easier if the DTOs are structs (since ValueType.Equals does mostly the right thing) but that is not always appropriate. I would also like to override comparison of any IEnumerable objects (but not strings!) so that their contents are compared rather than their properties.

This has proven to be harder than I would expect. Any hints will be greatly appreciated. I'll accept the answer that proves most useful or supplies a link to the most useful information.

Thanks.

+6  A: 
Abel
There *is* a default for doing it. See http://msdn.microsoft.com/en-us/library/2dts52z7.aspx
Jon Skeet
Thanks, Abel, but those are all examples where the Equals method has been overriden. I'm trying to understand the specifics of ValueType.Equals, not the implementation in it's subclasses.
Damian Powell
Reflection would be suitable, and is actually what I'm heading for. My ultimate goal is to implement a reasonable memberwise equality comparison for unit testing purposes.
Damian Powell
Apologies, I took *default* for what the default approach was throughout the BCL. `ValueType.Equals` is horrendously slow, but indeed, it's an implementation of memberwise (fields!) comparison for arbitrary objects. Interesting detail: it tries to do a bitwise compare first, not sure how often that succeeds though.
Abel
Ah, reflection is ok, that makes it easier :)
Abel
On *"where the Equals method has been overriden."* >>> actually, yes and no. More importantly, the `operator ==` was overridden, which is why I chose them. The comments are on the operator, not on Equals, which is close, but not the same.
Abel
Very cool, Abel. I have implemented something that is *very* similar (see my own answer). I wonder about the equals method tests you're doing though. Won't the declaring type always be System.Object? In my own solution, I try to get the IEquatable<>.Equals method to be sure. I'm not happy with my solution though.
Damian Powell
+2  A: 

This is more complex than meets the eye. The short answer would be:

public bool MyEquals(object obj1, object obj2)
{
  if(obj1==null || obj2==null)
    return obj1==obj2;
  else if(...)
    ...  // Your custom code here
  else if(obj1.GetType().IsValueType)
    return
      obj1.GetType()==obj2.GetType() &&
      !struct1.GetType().GetFields(ALL_FIELDS).Any(field =>
       !MyEquals(field.GetValue(struct1), field.GetValue(struct2)));
  else
    return object.Equals(obj1, obj2);
}

const BindingFlags ALL_FIELDS =
  BindingFlags.Instance |
  BindingFlags.Public |
  BindingFlags.NonPublic;

However there is much more to it than that. Here are the details:

If you declare a struct and don't override .Equals(), NET Framework will use one of two different strategies depending on whether your struct has only "simple" value types ("simple" is defined below):

If the struct contains only "simple" value types, a bitwise comparison is done, basically:

strncmp((byte*)&struct1, (byte*)&struct2, Marshal.Sizeof(struct1));

If the struct contains references or non-"simple" value types, each declared field is compared as with object.Equals():

struct1.GetType()==struct2.GetType() &&
!struct1.GetType().GetFields(ALL_FIELDS).Any(field =>
  !object.Equals(field.GetValue(struct1), field.GetValue(struct2)));

What qualifies as a "simple" type? From my tests it appears to be any basic scalar type (int, long, decimal, double, etc), plus any struct that doesn't have a .Equals override and contains only "simple" types (recursively).

This has some interesting ramifications. For example, in this code:

struct DoubleStruct
{
  public double value;
}

public void TestDouble()
{
  var test1 = new DoubleStruct { value = 1 / double.PositiveInfinity };
  var test2 = new DoubleStruct { value = 1 / double.NegativeInfinity };

  bool valueEqual = test1.value.Equals(test2.value);
  bool structEqual = test1.Equals(test2);

  MessageBox.Show("valueEqual=" + valueEqual + ", structEqual=" + structEqual);
}

you would expect valueEqual to always be identical to structEqual, no matter what was assigned to test1.value and test2.value. This is not the case!

The reason for this surprising result is that double.Equals() takes into account some of the intricacies of the IEEE 754 encoding such as multiple NaN and zero representations, but a bitwise comparison does not. Because "double" is considered a simple type, the structEqual returns false when the bits are different, even when valueEqual returns true.

The above example used alternate zero representations, but this can also occur with multiple NaN values:

...
  var test1 = new DoubleStruct { value = CreateNaN(1) };
  var test2 = new DoubleStruct { value = CreateNaN(2) };
...
public unsafe double CreateNaN(byte lowByte)
{
  double result = double.NaN;
  ((byte*)&result)[0] = lowByte;
  return result;
}

In most ordinary situations this won't make a difference, but it is something to be aware of.

Ray Burns
+1  A: 

Here's my own attempt at this problem. It works, but I'm not convinced I've covered all the bases.

public class MemberwiseEqualityComparer : IEqualityComparer
{
    public bool Equals(object x, object y)
    {
        // ----------------------------------------------------------------
        // 1. If exactly one is null, return false.
        // 2. If they are the same reference, then they must be equal by
        //    definition.
        // 3. If the objects are both IEnumerable, return the result of
        //    comparing each item.
        // 4. If the objects are equatable, return the result of comparing
        //    them.
        // 5. If the objects are different types, return false.
        // 6. Iterate over the public properties and compare them. If there
        //    is a pair that are not equal, return false.
        // 7. Return true.
        // ----------------------------------------------------------------

        //
        // 1. If exactly one is null, return false.
        //
        if (null == x ^ null == y) return false;

        //
        // 2. If they are the same reference, then they must be equal by
        //    definition.
        //
        if (object.ReferenceEquals(x, y)) return true;

        //
        // 3. If the objects are both IEnumerable, return the result of
        //    comparing each item.
        // For collections, we want to compare the contents rather than
        // the properties of the collection itself so we check if the
        // classes are IEnumerable instances before we check to see that
        // they are the same type.
        //
        if (x is IEnumerable && y is IEnumerable && false == x is string)
        {
            return contentsAreEqual((IEnumerable)x, (IEnumerable)y);
        }

        //
        // 4. If the objects are equatable, return the result of comparing
        //    them.
        // We are assuming that the type of X implements IEquatable<> of itself
        // (see below) which is true for the numeric types and string.
        // e.g.: public class TypeOfX : IEquatable<TypeOfX> { ... }
        //
        var xType = x.GetType();
        var yType = y.GetType();
        var equatableType = typeof(IEquatable<>).MakeGenericType(xType);
        if (equatableType.IsAssignableFrom(xType)
            && xType.IsAssignableFrom(yType))
        {
            return equatablesAreEqual(equatableType, x, y);
        }

        //
        // 5. If the objects are different types, return false.
        //
        if (xType != yType) return false;

        //
        // 6. Iterate over the public properties and compare them. If there
        //    is a pair that are not equal, return false.
        //
        if (false == propertiesAndFieldsAreEqual(x, y)) return false;

        //
        // 7. Return true.
        //
        return true;
    }

    public int GetHashCode(object obj)
    {
        return null != obj ? obj.GetHashCode() : 0;
    }

    private bool contentsAreEqual(IEnumerable enumX, IEnumerable enumY)
    {
        var enumOfObjX = enumX.OfType<object>();
        var enumOfObjY = enumY.OfType<object>();

        if (enumOfObjX.Count() != enumOfObjY.Count()) return false;

        var contentsAreEqual = enumOfObjX
            .Zip(enumOfObjY) // Custom Zip extension which returns
                             // Pair<TFirst,TSecond>. Similar to .NET 4's Zip
                             // extension.
            .All(pair => Equals(pair.First, pair.Second))
            ;

        return contentsAreEqual;
    }

    private bool equatablesAreEqual(Type equatableType, object x, object y)
    {
        var equalsMethod = equatableType.GetMethod("Equals");
        var equal = (bool)equalsMethod.Invoke(x, new[] { y });
        return equal;
    }

    private bool propertiesAndFieldsAreEqual(object x, object y)
    {
        const BindingFlags bindingFlags
            = BindingFlags.Public | BindingFlags.Instance;

        var propertyValues = from pi in x.GetType()
                                         .GetProperties(bindingFlags)
                                         .AsQueryable()
                             where pi.CanRead
                             select new
                             {
                                 Name   = pi.Name,
                                 XValue = pi.GetValue(x, null),
                                 YValue = pi.GetValue(y, null),
                             };

        var fieldValues = from fi in x.GetType()
                                      .GetFields(bindingFlags)
                                      .AsQueryable()
                          select new
                          {
                              Name   = fi.Name,
                              XValue = fi.GetValue(x),
                              YValue = fi.GetValue(y),
                          };

        var propertiesAreEqual = propertyValues.Union(fieldValues)
            .All(v => Equals(v.XValue, v.YValue))
            ;

        return propertiesAreEqual;
    }
}
Damian Powell
+1  A: 

This is the implementation of ValueType.Equals from the Shared Source Common Language Infrastructure (version 2.0).

public override bool Equals (Object obj) {
    BCLDebug.Perf(false, "ValueType::Equals is not fast.  "+
        this.GetType().FullName+" should override Equals(Object)");
    if (null==obj) {
        return false;
    }
    RuntimeType thisType = (RuntimeType)this.GetType();
    RuntimeType thatType = (RuntimeType)obj.GetType();

    if (thatType!=thisType) {
        return false;
    }

    Object thisObj = (Object)this;
    Object thisResult, thatResult;

    // if there are no GC references in this object we can avoid reflection 
    // and do a fast memcmp
    if (CanCompareBits(this))
        return FastEqualsCheck(thisObj, obj);

    FieldInfo[] thisFields = thisType.GetFields(
        BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);

    for (int i=0; i<thisFields.Length; i++) {
        thisResult = ((RtFieldInfo)thisFields[i])
            .InternalGetValue(thisObj, false);
        thatResult = ((RtFieldInfo)thisFields[i])
            .InternalGetValue(obj, false);

        if (thisResult == null) {
            if (thatResult != null)
                return false;
        }
        else
        if (!thisResult.Equals(thatResult)) {
            return false;
        }
    }

    return true;
}

It's interesting to note that this is pretty much exactly the code that is shown in Reflector. That suprised me because I thought that the SSCLI was just a reference implementation, not the final library. Then again, I suppose there is a limited number of ways to implement this relatively simple algorithm.

The parts that I wanted to understand more are the calls to CanCompareBits and FastEqualsCheck. These are both implemented as native methods but their code is also included in the SSCLI. As you can see from the implementations below, the CLI looks at the definition of the object's class (via it's method table) to see if it contains pointers to reference types and how the memory for the object is laid out. If there are no references and the object is contiguous, then the memory is compared directly using the C function memcmp.

// Return true if the valuetype does not contain pointer and is tightly packed
FCIMPL1(FC_BOOL_RET, ValueTypeHelper::CanCompareBits, Object* obj)
{
    WRAPPER_CONTRACT;
    STATIC_CONTRACT_SO_TOLERANT;

    _ASSERTE(obj != NULL);
    MethodTable* mt = obj->GetMethodTable();
    FC_RETURN_BOOL(!mt->ContainsPointers() && !mt->IsNotTightlyPacked());
}
FCIMPLEND

FCIMPL2(FC_BOOL_RET, ValueTypeHelper::FastEqualsCheck, Object* obj1,
    Object* obj2)
{
    WRAPPER_CONTRACT;
    STATIC_CONTRACT_SO_TOLERANT;

    _ASSERTE(obj1 != NULL);
    _ASSERTE(obj2 != NULL);
    _ASSERTE(!obj1->GetMethodTable()->ContainsPointers());
    _ASSERTE(obj1->GetSize() == obj2->GetSize());

    TypeHandle pTh = obj1->GetTypeHandle();

    FC_RETURN_BOOL(memcmp(obj1->GetData(),obj2->GetData(),pTh.GetSize()) == 0);
}
FCIMPLEND

If I wasn't quite so lazy, I might look into the implementation of ContainsPointers and IsNotTightlyPacked. However, I've definitively find out what I wanted to know (and I am lazy) so that's a job for another day.

Damian Powell