tags:

views:

221

answers:

5

i want to compare xml document. some are 50k+. i'm comparing the OuterXml. is this efficient? is there more effient way?

+7  A: 

Just comparing the textual representation of your XML will not yield valid results - check this out:

<node x="1" y="2" />

and

<node y="2" x="1" />

are identical as far as XML processing goes (order of attributes on a node is irrelevant), but when you compare nothing but text, you'll flag it as a difference.

Microsoft used to have a XmlDiff tool on GotDotNet, but I'm not sure if that's still available somewhere.....

UPDATE
XmlDiff seems to be still available - check out this download link, as well as this "Using the XML Diff and Patch Tool in your application" link on MSDN.

Marc

marc_s
+3  A: 

Depends on what kind of comparison you want.

For instance, if your intention is to just compare the content within two files and get a true/false status , then I would suggest using XmlReader for each of the two files that you want to compare and then parse the nodes. The moment you encounter a difference you can stop parsing.

This is different from using XML document where you have to read the entire document into memory, get the string representation and compare the strings.(For smaller file sizes it does not matter)

Two xml documents could be semantically equivalent, but structure might be different.(In which case your comparison has to be smarter).

If you intend to modify the source document, in case comparison fails/succeeds, then DOM way is preferred (XmlDocument class and its API).

Prashanth
I agree, it depends on whether a logical equivalence (the byte data contained in both XML documents are identical) or semantically equivalent (the information represented in both XML documents have the same meaning). For instance, if differences "in-significant whitespace" (e.g. element indentation/formatting whitespace) matter then you have to approach the comparison operation different than if you just want to know if each attribute, element, and node's data are the same, etc.
Burly
+2  A: 

There is also this open source project: http://diffxml.sourceforge.net/

I have used both XmlDiff from MicroSoft and this framework. I think MS XmlDiff has a bit more comparison features, so now I use that. But if you want open source, DiffXml is a good framework.

Karsten
+1  A: 

To compare XML files, I had troubles with MS XmlDiff so I wrote a much simpler comparison method. I wrote a simple application that would select all elements that have attributes, since the XML files I needed to compare doesn't store values in the element node itself. This selection is real easy in XPath: //[@]
I did this for both documents, thus having two lists of nodes. Then I changed these nodes to an XPath string by walking recursively through it's parent nodes and adding the attruibute values as conditions. Thus, I ended up with two lists of XPath strings.
The final step was walking though these lists, checking if the other document has a node with the given XPath. If not, it was missing and I would know the exact element that was missing. This list of missing nodes was reported to a text file and thus I had a simple report about the differences between two documents, ignoring the attribute order, the values of the elements and all elements without attributes. Which was exactly what I needed.

But if you need a more complex XML comparison, read the other answers. :-)

Workshop Alex
A: 

Project: Merge is a Windows application that can compare (and merge) XML files.

James