views:

319

answers:

1

First, let me begin by telling you the details on the problem I'm trying to solve.

We have a third party application that uses Xml Documents to store all of it's business logic and look up tables and such. The application has a base set of Xml Files, and uses a kind of inheritance model to expose inherited Xml files that we're to edit to customize the business logic. I say "Kind of" due to the horrible implementation of inheritance it uses.

Currently there are over 3000 seperate Xml files ranging from 1k to 5000k and totaling about 600MB in size. The only good thing so far, is that they all use the same Xsd.

Our problem is, we receive monthly updates to the core Xml files, and we're supposed to put them in place, and upgrade our custom documents to line up with the new version of the base documents. We're currently doing this manually, using DiffDog, and piecing together the documents to create new ones, but I'm trying to wrap my head around the possibility of doing this programmatically. Let me see if I can kind of visualize this for you:

We start off with a structure kind of like this below, with the base template in place, and a custom template that we can define our custom rules in (Which we do a lot)

..\LineOfBusiness\BaseTemplates\BaseXml_1_0_0_0.xml
..\LineOfBusiness\CustomTemplates\Document_1_0_0_0.xml

We're then given an upgrade each month so now we have a structure like this:

..\LineOfBusiness\BaseTemplates\BaseXml_1_0_0_0.xml
..\LineOfBusiness\BaseTemplates\BaseXml_1_1_0_0.xml
..\LineOfBusiness\CustomTemplates\Document_1_0_0_0.xml

Our job essentially is to create the

..\LineOfBusiness\CustomTemplates\Document_1_1_0_0.xml

document ourselves every month, bringing the changes we made in the previous version, into the new versions logic.

I know this system is ridiculous, but I can't change that today. Any ideas on how to tackle this problem would be great. I can tell you what I've thought of so far...

  1. Deserialize the Base and Custom old version documents to get a list of specific differences, the apply those differences to a deserialized version of the new Base and apply the differences to it, then reserialize to xml.

  2. Apply some sort of annotation process to the Custom Templates, so that we can extract the differences programmatically at upgrade time.

  3. Outsource the upgrade process...

+1  A: 

If your using a .NET language, you might be able to accomplish what your trying to do with Microsoft's XML Diff and Patch tool/library.

I've used it to correctly identify that there were changes between different xml fragments. This was important for our scenario as the XML we had on disk would differ after being stored in a Sql Server XML column because of insignificant whitespace being removed, and/or re-arranging attributes (Infoset). Just comparing the text blobs would always detect a difference, when actually the XML elements/values were the same.

I've not used the patching ability of the tool, only XmlDiff.

There are several nice commercial XML diff tools on the market, but I don't know of any that provide a code, or scripting, API. That would be a nice feature for value add!

Zach Bonham
I've looked at other Diff/Patch tools, I'll have to take a look at that one. That basic concept was something I was thinking about. Take two 5_0 docs, and generate a patch from them. Then take that patch and apply it to the 5_1 document to create the custom 5_1 document.
Jeff Sheldon
if you get something working to your liking, post about it. I'd like to hear your experience with it. Good luck!
Zach Bonham
The Xml Diff works great for comparison, the only problem is the DiffGram I can generate stores a hash value for the original document. So I can't apply the patch to the new document. Also, it builds the patch based on the index of the nodes, so I couldn't manually apply it to a new document either. That tool will definitely become useful in the future, but I don't think it'll help in this situation.
Jeff Sheldon
Regardless of this helping my particular circumstance, I think this is the best answer for the question, so I went ahead and marked it as answered.
Jeff Sheldon