views:

489

answers:

7

Say I have a unit test that wants to compare two complex for objects for equality. The objects contains many other deeply nested objects. All of the objects' classes have correctly defined equals() methods.

This isn't difficult:

@Test
public void objectEquality() {
    Object o1 = ...
    Object o2 = ...

    assertEquals(o1, o2);
}

Trouble is, if the objects are not equal, all you get is a fail, with no indication of which part of the object graph didn't match. Debugging this can be painful and frustrating.

My current approach is to make sure everything implements toString(), and then compare for equality like this:

    assertEquals(o1.toString(), o2.toString());

This makes it easier to track down test failures, since IDEs like Eclipse have a special visual comparator for displaying string differences in failed tests. Essentially, the object graphs are represented textually, so you can see where the difference is. As long as toString() is well written, it works great.

It's all a bit clumsy, though. Sometimes you want to design toString() for other purposes, like logging, maybe you only want to render some of the objects fields rather than all of them, or maybe toString() isn't defined at all, and so on.

I'm looking for ideas for a better way of comparing complex object graphs. Any thoughts?

+1  A: 

I followed the same track you are on. I also had additionnal troubles:

  • we can't modify classes (for equals or toString) that we don't own (JDK), arrays etc.
  • equality is sometimes different in various contexts

For example, tracking entities equality might rely on database ids when available ("same row" concept), rely the equality of some fields (the business key) (for unsaved objects). For Junit assertion, you might want all fields equality.


So I ended up creating objects that run through a graph, doing their job as they go.

There is typically a superclass Crawling object:

  • crawl through all properties of the objects ; stop at:

    • enums,
    • framework classes (if applicable),
    • at unloaded proxies or distant connections,
    • at objects already visited (to avoid looping)
    • at Many-To-One relationship, if they indicate a parent (usually not included in the equals semantic)
    • ...
  • configurable so that it can stop at some point (stop completely, or stop crawling inside the current property):

    • when mustStopCurrent() or mustStopCompletely() methods return true,
    • when encountering some annotations on a getter or a class,
    • when the current (class, getter) belong to a list of exceptions
    • ...

From that Crawling superclass, subclasses are made for many needs:

  • For creating a debug string (calling toString as needed, with special cases for Collections and arrays that don't have a nice toString ; handling a size limit, and much more).
  • For creating several Equalizers (as said before, for Entities using ids, for all fields, or solely based on equals ;). These equalizers often need special cases also (for example for classes outside your control).

Back to the question : These Equalizers could remember the path to the differing values, that would be very useful your JUnit case to understand the difference.

  • For creating Orderers. For example, saving entities need to be done is a specific order, and efficiency will dictate that saving the same classes together will give a huge boost.
  • For collecting a set of objects that can be found at various levels in the graph. Looping on the result of the Collector is then very easy.


As a complement, I must say that, except for entities where performance is a real concern, I did choose that technology to implements toString(), hashCode(), equals() and compareTo() on my entities.

For example, if a business key on one or more fields is defined in Hibernate via a @UniqueConstraint on the class, let's pretend that all my entities have a getIdent() property implemented in a common superclass. My entities superclass has a default implementation of these 4 methods that relies on this knowledge, for example (nulls need to be taken care of):

  • toString() prints "myClass(key1=value1, key2=value2)"
  • hashCode() is "value1.hashCode() ^ value2.hashCode()"
  • equals() is "value1.equals(other.value1) && value2.equals(other.value2)"
  • compareTo() is combine the comparison of the class, value1 and value2.

For entities where performance is of concern, I simply override these methods to not use reflexion. I can test in regression JUnit tests that the two implementations behave identically.

KLE
+4  A: 

What you could do is render each object to XML using XStream, and then use XMLUnit to perform a comparison on the XML. If they differ, then you'll get the contextual information (in the form of an XPath, IIRC) telling you where the objects differ.

e.g. from the XMLUnit doc:

Comparing test xml to control xml [different] 
Expected element tag name 'uuid' but was 'localId' - 
comparing <uuid...> at /msg[1]/uuid[1] to <localId...> at /msg[1]/localId[1]

Note the XPath indicating the location of the differing elements.

Probably not fast, but that may not be an issue for unit tests.

Brian Agnew
+1 I'm liking this... similar approach to comparing `toString()`, without needing the `toString()`. I suspect sticking with string comparison and IDE support would be easier, however.
skaffman
I think you could easily write a utility method called assertSameDeeply() or similar, and it would be entirely generic. Just statically import it and use it like all the other JUnit stuff.
Brian Agnew
I like this solution because the comparison results are well formatted. However, I believe matt b's pointer to Hamcrest is very good and the solution smells better.
Kariem
A: 

Unit tests should have well-defined, single thing they test. This means that in the end you should have well-defined, single thing that can be different about those two object. If there are too many things that can differ, I would suggest splitting this test into several smaller tests.

Ula Krukar
I don't agree, that's not always practical. For example say I create a JPA entity, persist it, then retrieve it, and I want to test that the retrieved object is equal to the one I stored. I can only do that for the top-level objects.
skaffman
The point is that he's checking 2 objects are the same (a single comparison). Those objects may be complex, and although the assertion is trivial and correct, the diagnosis of issues when they differ is not.
Brian Agnew
His single thing is "are these objects equal"
matt b
A: 

We use a library called junitx to test the equals contract on all of our "common" objects: http://www.extreme-java.de/junitx/

The only way I can think of to test the different parts of your equals() method is to break down the information into something more granular. If you are testing a deeply-nested tree of objects, what you are doing is not truly a unit test. You need to test the equals() contract on each individual object in the graph with a separate test case for that type of object. You can use stub objects with a simplistic equals() implementation for the class-typed fields on the object under test.

HTH

RMorrisey
A: 

I would not use the toString() because as you say, it is usually more useful for creating a nice representation of the object for display or logging purposes.

It sounds to me that your "unit" test is not isolating the unit under test. If, for example, your object graph is A-->B-->C and you are testing A, your unit test for A should not care that the equals() method in C is working. Your unit test for C would make sure it works.

So I would test the following in the test for A's equals() method: - compare two A objects that have identical B's, in both directions, e.g. a1.equals(a2) and a2.equals(a1). - compare two A objects that have different B's, in both directions

By doing it this way, with a JUnit assert for each comparison, you will know where the failure is.

Obviously if your class has more children that are part of determining equality, you would need to test many more combinations. What I'm trying to get at though is that your unit test should not care about the behavior of anything beyond the classes it has direct contact with. In my example, that means, you would assume C.equals() works correctly.

One wrinkle may be if you are comparing collections. In that case I would use a utility for comparing collections, such as commons-collections CollectionUtils.isEqualCollection(). Of course, only for collections in your unit under test.

SingleShot
+5  A: 

The Atlassian Developer Blog had a few articles on this very same subject, and how the Hamcrest library can make debugging this kind of test failure very very simple:

Basically, for an assertion like this:

assertThat(lukesFirstLightsaber, is(equalTo(maceWindusLightsaber)));

Hamcrest will give you back the output like this (in which only the fields that are different are shown):

Expected: is {singleBladed is true, color is PURPLE, hilt is {...}}  
but: is {color is GREEN}
matt b
+1 thanks for the links
skaffman
+2  A: 

Because of the way I tend to design complex objects, I have a very easy solution here.

When designing a complex object for which I need to write an equals method (and therefore a hashCode method), I tend to write a string renderer, and use the String class equals and hashCode methods.

The renderer, of course, is not toString: it doesn't really have to be easy for humans to read, and includes all and only the values I need to compare, and by habit I put them in the order which controls the way I'd want them to sort; none of which is necessarily true of the toString method.

Naturally, I cache this rendered string (and the hashCode value as well). It's normally private, but leaving the cached string package-private would let you see it from your unit tests.

Incidentally, this isn't always what I end up with in delivered systems, of course - if performance testing shows that this method is too slow, I'm prepared to replace it, but that's a rare case. So far, it's only happened once, in a system in which mutable objects were being rapidly changed and frequently compared.

The reason I do this is that writing a good hashCode isn't trivial, and requires testing(*), while making use of the one in String avoids the testing.

(* Consider that step 3 in Josh Bloch's recipe for writing a good hashCode method is to test it to make sure that "equal" objects have equal hashCode values, and making sure that you've covered all possible variations are covered isn't trivial in itself. More subtle and even harder to test well is distribution)

CPerkins