views:

64

answers:

2

I am using asp.net and the .net framework 2.0. I may be able to upgrade the servers to 3.5 if the solution is compelling enough. Here is the problem.

I have two pieces of xml. I'll refer to piece number 1 as the template and piece number 2 as the actual.

Here's a basic example:

template:

<questions>
 <question1 msg="1234">
   <answer></answer>
 </question1>
 <question2 msg="1235">
   <answer></answer>
 </question2>
 <question3 msg="">
   <answer></answer>
 </question3>
</questions>

actual:

<questions>
 <question1 msg="1234">
   <answer>foo</answer>
 </question1>
 <question2 msg="1235">
   <answer>bar</answer>
 </question2>
 <question3 msg="dynamic">
   <answer>blob</answer>
 </question3>
</questions>

The template is generic and common to many users, then there is the actual which is specific to individual users.

I would like to extract the delta between the actual and the template in such a way that it can be saved independently and then subsequently re-applied to the template to arrive at a complete representation of the actual xml.

I've done some looking and found an "XML Diff and Patch" tool for the .net 1.0 that looks like it does pretty much exactly what I need but then I found some other references to it that seem to indicate it has dropped off the radar. http://msdn.microsoft.com/en-us/library/aa302294.aspx

I've also found a number of examples which rely on the specific xml structure to manually extract the differences between the entities represented by the xml. I am generally uncomfortable with this solution and would really prefer a more generic one that is resilient to changes made to the xml.

Ideally, I'd love to find the xmldiff/patch functionality built into .net2.0/3.5 somewhere. If not, then something which solves the above problem in a generic enough way that it doesn't break when the xml changes.

Thanks

+1  A: 

I think you may be over-engineering this. Although a diff/patch tool may meet your needs, it seems to me that something less general would also meet your needs.

In your example, the <answer/> tag is always present in the template, and is always empty with no attributes. All of the child elements of the <questions/> tag have names beginning with "question", and they all have a msg attribute, whose value is either an integer or blank. If it's an integer, then its value matches the corresponding value from the "actual" file, but if blank it can match "dynamic".

Given these constraints, the set of possible differences is a lot simpler to describe: it's just the content of the <answer/> element under each question. This is much easier to reason about than a general purpose diff utility would be.

John Saunders
Hi JohnThanks for your answer, but the constraints you've listed are a product of my simplified example and aren't dependable in my real circumstance.The real xml has more variability than the example.One constraint that may be worth noting is that the structure of the 'actual' will always be present within the 'template'
Todd
@Todd: that's still good. That means there will never be a difference related to structure, only content. So, if the real situation is more complex, then complicate the solution I gave. For instance, maybe there can be attrributes in the "actual" which are not present in the "template"? It still doesn't seem like you need the full generality of a diff/patch tool.
John Saunders
Maybe you're right John. I'm going to investigate this option more seriously. Off hand I am not coming up with something to break your concept other than the possibility of nodes in the 'template' that are not present in the 'actual'.It seems to me though that a solution relying upon the shape of the xml content is less simple and more tenuous since the code that handles the diff/patching may need to respond to changes in the xml content. This seems like a less obvious relationship which could become hard to manage and maintain.
Todd
@Toss: in general, I'm saying that since you own both the template and actual documents, it's ok to proceed from your knowledge of these documents. If there's a simple way to represent the possible deltas, then that's the way to go. If there's no simple way to do it, then you need a general-purpose diff algorithm.
John Saunders
So the question then remains. Do you know where I can find such generic diff/patch functionality. If it was built in .net 1.0 it must be around somewhere?
Todd
I don't know where it went to, which is one reason I've been recommending not to use code that generic.
John Saunders
A: 

I obtained the xmlDiffPatch package from Microsoft at the following url: http://download.microsoft.com/download/xml/Patch/1.0/WXP/EN-US/xmldiffpatch.exe

It works perfectly in my .net 2.0 solution. Using this technique I have been able to reduce the amount of data stored to between 2% and 25% of the amount of data that we previously needed to store.

Todd