views:

195

answers:

3

I am working on a project where we need to save data in an XML format. The problem is, over time we expect the format / schema for our data to change. What we want to be able to do is to produce scripts to migrate our data across different schema versions. We distribute our product to thousands of customers so we need to be able to run / apply these scripts at customer sites (so we can't just do the conversions by hand). I think that what we are looking for is some kind of XML data migration tool. In my mind the ideal tool could:

  1. Do an "XML diff" of two schema to identify added/deleted/changed nodes.

  2. Allow us to specify transformation functions. So, for example, we might add a new element to our schema that is a function of the old elements. (E.g. a new element C where C = A+B, A + B are old elements).

So I think I am looking for a kind of XML diff and patch tool which can also apply transformation functions. One tool I am looking at for this is Altova's MapForce . I'm sure others here have had to deal with XML data format migration. How did you handle it?

Edit: One point of clarification. The "diff" I plan to do is on the schema or .xsd files. The actual changes will be made to particular data sets that follow a given schema. These data sets will be .xml files. So its a "diff" of the schema to help figure out what changes need to be made to data sets to migrate them from one scheme to another.

+2  A: 

"Do an "XML diff" of two schema to identify added/deleted/changed nodes."

XSD's are text, so this is trivial.

However, if you make dramatic structural changes to the XSD's, the automated diff will be largely useless.

If you make small, cosmetic changes to the XSD's, this might be helpful.

"Allow us to specify transformation functions..."

Wouldn't that be nice. Sadly, the odds of there being some trivial change ("new element C where C = A+B, A + B are old elements") are almost nil. Why make that kind of trivial change?

No, when you "... distribute our product to thousands of customers", you don't make trivial, cosmetic changes. You save up the changes so that they are truly epic, and "create significant value."

No, the odds of there being an automated schema migration is nearly zero.

Instead, design for migratability.

  1. Be sure that the version number is prominent in your XSD paths. Ideally, in the XSD name itself.

  2. Each XSD change is a Serious Governance Issue (SGI™). Everyone participates. And you write the migration scripts right then and there. Not afterwards. Not with tools. But as part of an XSD change.

    Schema don't spontaneously change. Someone changes them for a reason. That someone can specify the changes so someone else can write (or update) the migration script.

This is far, far too serious to trust to "automated" tools. This requires real brains of real people really focusing on this as if their jobs depended on it.

S.Lott
Just an aside: XSD's are text, but there are some non-significant differences that a text diff will pick up (e.g. whitespace, choice/all order). Better to say "XSD's are XML, so use an XML diff". However, this probably isn't an issue in practice, because since they are different versions of the same schema (and hopefully, the only changes were significant).
13ren
S. Lott makes some excellent points here, but an automated tool can help. My experience with doing transformations like this comes from database schema changes. I agree that making changes should be taken seriously. In the database world this generally comes down to:1. Adding/Deleting/Modifying columns or tables.2. Occasionally writing a SQL procedure to map old data to new data.Step 1 can be automated via a database diff - tools can do this.Step 2 needs programming.I'm looking for an XML tool to help with step 1, that can also take more complex transforms (e.g. XSLT) as needed.
Corwin Joy
@Corwin Joy: Don't waste time on automating add/change/delete columns. It won't happen often enough in practice. Schema changes are serious business and require serious manual processes to *understand* the *meaning* behind the changes.
S.Lott
A: 

As @S.Lott says, the ability to automate transformations is unlikely. However, XSLT is a fantastic tool for formally defining how to transform XML from one format to another. It can't be automatically generated (as far as I know), but it's well worth doing things this way.

Jacob
A: 

I ended up writing a tool to do this and released the result as a SourceForge project.

What: This tool helps create scripts to migrate XML data from one version of an XML schema to a later version of the same schema. The tool creates these scripts by differencing XSD files and emitting XSLT 2.0 to automatically migrate XML data. This works well for simple data changes and can be used as "starter" code for more complex data changes.

Where: https://sourceforge.net/projects/xsdevolver/

Background: The company I work for sells a shrink-wrapped application where we save a workbook in an XML format according to a specified XSD schema. Over time, we expect the format of this schema to change. We wanted a way to help us diff schema versions as they evolve over time and generate initial XSLT to migrate data from older versions of the schema to newer versions of the schema.

Usage:

XMLSchemaEvolver SchemaVersion1.xsd SchemaVersion2.xsd

Output:

  1. A schema diff showing what elements have been changed

  2. XSLT to translate XML data from SchemaVersion1 to SchemaVersion2

How does it work?

The basic idea is this:

1) Do a diff of two xml schema (xsd) files.

2) Each change is classified as an INSERT, DELETE, MOVE or RENAME operation.

3) For each of these operations, emit simple XSLT to carry out the desired data change.

4) These data change operations are modeled after a set of standard XSLT operations suggested by Jesper Tverskov link text. A full list of the transformations emitted by our code can be found XSLT Transformations.txt in the documentation folder.

Corwin Joy