I would like to compute the diff between two XML files or nodes using XSL/XSLT. Is there any stylesheet readily available or any simple way of doing it?
Interesting question! I once tried to do something similar involving two XML sources, and my experience was that there just ain't no way.
You could use XSL's facility for including user-built functions, and code up something really slick. But I really can't see it.
If I were to do something like this, I'd process the two XML files in parallel using DOM4J, which lets me easily traverse the code programmatically and do detail sub-queries.
Trying to do this in XSLT will either prove you to be a genius or drive you into madness.
XSLT is data-driven, that is, it goes through the single source XML file top to bottom looking for template matches in the XSL stylesheet. The templates don't really know where they are in the data, they just run their code when matched. You can reference another XML source, but the program will run according to the traversal of the original source.
So when you arrive at the nth child element of <blarg>
, for example, you could look up the nth child of <blarg>
in a second XML using the document()
function. But the usefulness of this depends on the structure of your XML and what comparisons you're trying to do.
This behavior is opposite of most traditional scripts, which run through the program code top to bottom, calling on the data file when instructed. The latter--pull processing--is what you probably need to compare two XML sources. XSLT will break down in comparison as soon as there is a difference.
There are ways to do this, but I wouldn't say it's simple.
In the past I've used an opensource utility called diffmk, this produces an output XML with extra tags showing what has been added/removed...
I had to write an extra stylesheet to then convert this into a more readable HTML report.
Some diff tools like XMLSpy Diff dog are good, but costly.
This is not a mystery! Here are the general steps:
1.) @carillonator is right about how XSLT processes documents. So to make it easier we combine the two versions of your documents into a single document you can use to run your XSLT diff on ( You can do this via the command line with bash, or with whatever programming language you are using, or even another XSLT transform [pipe] ). It's just an encapsulation:
<diff_container>
<version1>
... first version here
</version1>
<version2>
... second version here
</version2>
</diff_container>
2.) We then run this document through our XSLT diff, the XSLT then has the job of simply traversing the tree and comparing nodes between the two versions. This can go from very simple ( Was an element changed? Moved? Removed? ) to semi complex. A good understanding of XPath makes this fairly simple.
Like some said before, your working inside a different environment so you are limited compared to tools like Diff Dog. However the benefit of having the algorithm in XSLT can have real value too.
Hope this helped. Cheers!