views:

2054

answers:

4
+5  Q: 

XML Diff and Merge

Hello, I think i have a rather unique problem to solve. Well, i cant find enough information using Google. So here it goes,

I work on a JEE SOA application which stores XML documents as XML using Oracle XML DB. Whenever the XML changes, i increment the version and throw the previous version into a different table.

The requirement now is, I should store the differences between 2 versions as XML, instead of the whole XML document.

1) Is there any Java library which can do XML comparison? (XMLUnit, ... ?) 2) Is there a standard XML Schema for capturing XML differences? 3) What transformation technology can i use to apply the "differences" to an XML to go back and forth between versions? (XSLT, Groovy,.... ?)

I appreciate your time.

+3  A: 

There are any number of open-source XML diff tools written in Java that you can crib from. One list of such tools is here.

Jekke
A: 

I think you should not fixate on xml comparison, but look at text comparison as a whole.

Just store a text diff of the 2 xmls, you can use the patch format, which you can apply later to reconstruct the original file.

Alex Shnayder
a text diff is very different from an XML diff
Very bad idea, IMHO. A XML-aware diff works on the infoset, not on the text representation. With a text diff, even recoding the XML file from UTF-8 to UTf-16 would make a change, something I find hard to swallow.
bortzmeyer
+2  A: 

In my last job, we had a similar problem: We had to detect changes, insertions, and deletions of specific items between two XML files. The files weren't arbitrary XML; they had to adhere to our XSD.

Our solution was to implement a kind of merge sort: Parse the files (using a SAX parser, not a DOM parser, to permit arbitrarily large files), and store the parsed data in separate HashMaps. Then, we compared the contents of the two maps using a merge-sort type of algorithm.

Naturally, the larger the files got, the more memory pressure we experienced, so I ultimately wrote a FileHashMap class that pushed the HashMap's value space to random access files. While theoretically slower, this solution allowed our comparisons to work with very large files, without thrashing or OutOfMemoryError conditions. (A version of that FileHashMap class is available in this library: http://www.clapper.org/software/java/util/)

I have no idea whether what I just described is even remotely close to what you need, but I thought I'd share it, just in case.

Good luck.

Brian Clapper
+3  A: 

Side note: there is now a standard format for XML-aware "patches", in RFC 5261. There is at least one free software program, xmlpatch, which implements it. It is written in C, you may call it from Java.

bortzmeyer