tags:

views:

1225

answers:

3

I have a heap of unit tests that need to check XML outputs.

I have started out comparing strings but this isn't going to scale as formatting and superficial differences get in the way.

What is the easiest way in .NET to evaluate whether the generated XML is semantically the same as what the test expects?

Closed as duplicate of How would you compare two XML Documents?

+3  A: 

Microsoft offers its XML Diff tools/classes here. I haven't personally used it, but it sounds like it'll get you started:

"By using the XMLDiff class, the programmer is able to determine if the two files are in fact different based on the conditions that are important to their application"

It seems to cope with different ordering, spacing, namespace prefixes etc.

Xiaofu
+1  A: 

Same question here.

bruno conde
A: 

This is one of those problems that sounds like it's going to be easy to start with, but the more you dig the more depth you find in the problem space.

There are a number of pre-existing tools out there that will do xml diffs - both in a GUI fashion (along the same lines as textual diff tools), and command line driven/ components (which are more what you'd be after). XMLDiff is one such case, as has already been mentioned.

The troubles start when you ask questions like - what do I want it to output? Do you just want a return code that says if they are the same or different (for unit test purposes this may actually be enough) - or do you want a report that tells you what the differences are? (could also be useful for unit tests if you want to find what the problem is)? If the latter, how do you want that information? Do you want edit distance? Do you want it to interpret numeric values and tell you the difference between them?

What about node ordering? Should child nodes be in a particular order - or if they are the same nodes but in a different order is that ok?

You'll probably also want to be able to specify what to compare. Should namespaces match? Is whitespace significant anywhere? Are there certain nodes you always want to ignore (e.g. a "time" attribute), or do you want finer grained control over exactly which nodes get compared and which don't?

For numerical comparisons do you want to allow for tolerances? For textual comparisons (text nodes), is whitespace within the text significant? What about capitalisation?

And you can go on and on (as I have done in an analysis for just such a project where I work recently).

Each tool addresses these issues to different degrees and in different ways.

You might decide to just keep it as simple as possible and go for something that does a direct, node-by-node, comparison, with no interpretation - and at the end tell you if they are the same or different. I believe xmldiff will give you that (and a little more).

Also worth thinking about is, if you do want to handle cases such as different node orderings, or ignoring certain branches, you could apply an xslt transform to your test document before the comparison, to normalise it according to your rules.

Phil Nash