views:

82

answers:

3

If I want to compare the contents of a XMlDocument, is it just like this?

XmlDocument doc1 = GetDoc1();
XmlDocument doc2 = GetDoc2();

if(doc1 == doc2)
{

}

I am not checking if they are both the same object reference, but if the CONTENTS of the xml are the same.

+4  A: 

No. XmlDocument does not override the behavior of the Equals() method so, it is in fact just performing reference equality - which will fail in your example, unless the documents are actually the same object instance.

If you want to compare the contents (attributes, elements, commments, PIs, etc) of a document you will have to implement that logic yourself. Be warned: it's not trivial.

Depending on your exact scenario, you may be able to remove all non-essential whitespace from the document (which itself can be tricky) and them compare the resulting xml text. This is not perfect - it fails for documents that are semantically identical, but differ in things like how namespaces are used and declared, or whether certain values are escaped or not, the order of elements, and so on. As I said before, XML comparison is not trivial.

You also need to clearly define what it means for two XML documents to be "identical". Does element or attribute ordering matter? Does case (in text nodes) matter? Should you ignore superfluous CDATA sections? Do processing instructions count? What about fully qualified vs. partially qualified namespaces?

In any general purpose implementation, you're likely going to want to transform both documents into some canonical form (be it XML or some other representation) and then compare the canonicalized content.

Tools already exist that perform XML differencing, like Microsoft XML Diff/Patch, you may be able to leverage that to identify differences between two documents. To my knowledge that tool is not distributed in source form ... so to use it in an embedded application you would need to script the process (if you plan to use it, you should first verify that the licensing terms allow it's use and redistribution).

EDIT: Check out @Max Toro's answer if you're using .NET 3.5 SP1, as apparently there's an option in XLinq that may be helpful. Nice to know it exists.

LBushkin
A: 

LBushkin is right, this is not trivial. Since XML is string data you could technically perform a hash of the contents and compare them, but that will be affected by things like whitespace.

You could perform a structured diff (also called 'XML diffgram') between the two documents and compare the results. This is how .NET datasets keep track of changes, for example.

Other than that you'd have to iterate through the DOM and compare elements, attributes and values to each other. If there's a schema involved then you would also have to take into account positions and so on.

kprobst
+3  A: 

Try the DeepEquals method on the XLinq API.

XDocument doc1 = GetDoc1(); 
XDocument doc2 = GetDoc2(); 

if(XNode.DeepEquals(doc1, doc2)) 
{ 

} 
Max Toro
Didn't know about it, thank you.
Alex Bagnolini
Very nice. I did not know this existed. It looks like it handles many of the cases that I describe.
LBushkin