views:

7702

answers:

5

I'm trying to write an automated test of an application that basically translates a custom message format into an XML message and sends it out the other end. I've got a good set of input/output message pairs so all I need to do is send the input messages in and listen for the XML message to come out the other end.

When it comes time to compare the actual output to the expected output I'm running into some problems. My first thought was just to do string comparisons on the expected and actual messages. This doens't work very well because the example data we have isn't always formatted consistently and there are often times different aliases used for the XML namespace (and sometimes namespaces aren't used at all.)

I know I can parse both strings and then walk through each element and compare them myself and this wouldn't be too difficult to do, but I get the feeling there's a better way or a library I could leverage.

So, boiled down, the question is:

Given two Java Strings which both contain valid XML how would you go about determining if they are semantically equivalent? Bonus points if you have a way to determine what the differences are.

+7  A: 

Xom (http://xom.nu) has a Canonicalizer utility which turns your DOMs into a regular form, which you can then stringify and compare. So regardless of whitespace irregularities or attribute ordering, you can get regular, predictable comparisons of your documents.

This works especially well in IDEs that have dedicated visual String comparators, like Eclipse. You get a visual representation of the semantic differences between the documents.

skaffman
+23  A: 

Sounds like a job for XMLUnit

http://xmlunit.sourceforge.net/

Tom
I knew something like this had to be out there. I can't believe Google didn't find it for me. Thanks.
Mike Deck
I'e had problems with XMLUNit in the past, it's been hyper-twitchy with XML API versions and hasn't proven reliable. It's been a while since I ditched it for XOM, though, so maybe it's impoved since.
skaffman
A: 

skaffman seems to be giving a good answer.

another way is probably to format the XML using a commmand line utility like xmlstarlet(http://xmlstar.sourceforge.net/) and then format both the strings and then use any diff utility(library) to diff the resulting output files. I don't know if this is a good solution when issues are with namespaces.

anjanb
A: 

Since you say "semantically equivalent" I assume you mean that you want to do more than just literally verify that the xml outputs are (string) equals, and that you'd want something like

<foo> some stuff here</foo></code>

and

<foo>some stuff here</foo></code>

do read as equivalent. Ultimately it's going to matter how you're defining "semantically equivalent" on whatever object you're reconstituting the message from. Why not simply build that object from the messages and use a custom equals() to define what you're looking for?

Steve B.
A: 

I'm using Altoba DiffDog which has options to compare XML files structurally (ignoring string data).

This means that (if checking the 'ignore text' option):

<foo a="xxx" b="xxx">xxx</foo>

and

<foo b="yyy" a="yyy">yyy</foo> 

are equal in the sense that they have structural equality. This is handy if you have example files that differ in data, but not structure!

Skipperkongen
Only minus is that it is not free (99€ for a pro license), with 30 day trial.
Skipperkongen
I have found only the utility (http://www.altova.com/diffdog/diff-merge-tool.html); nice to have a library.
dma_k