views:

65

answers:

1

Hello SO,

This question concerns XML schemas and files.

Suppose I am developing a desktop application with a file-based interface, i.e. the user stores their progress in a file on disk - pretty standard for the vast majority of productivity applications and many more besides. The file is fundamentally XML, whose schema is stored by some means or another within the application.

It is acknowledged that the schema is very likely to change as new features are added. Therefore, for rigorous compatibility management, I'd like to make sure that the program can tell by inspecting the file which exact schema version it was last saved under, and automatically pipeline the file through one or more transforms to express it as the working file format i.e. the most recent schema revision.

What is the best-practice way to implement this kind of functionality? The simplest method seems to me to use a different schema namespace for each revision and ensure that at least the document element of the file references the correct namespace. The trouble with this approach is that, to my mind, it breaks the relationship of file structures to one another - i.e. the document element of a file saved under revision x is the same type as the corresponding element under revision y, but as far as the application knows, they're unrelated unless I explicitly tell it otherwise. However, I dare say that this sort of logic is part of the reason for the existence of XML namespaces, so I'm honestly not sure. What say you, SO?

edit #1:

It appears upon further inspection that XML Schema provides a 'version' attribute natively. This is presumably the source of the string property "Version" for the XmlSchema type in .Net, which is my intended platform. This is all well and good, but getting i) my files and ii) my application to respect this value is another matter. It would be trivial, as kbrimington suggests, to mandate a 'schema version' attribute in application files. Then I simply match the version attribute from a loaded xml file to a schema, run validation, and have the application throw a fit/politely chide the user/bravely struggle on as appropriate.

edit #2:

In case anyone is interested, I have gone with using the 'version' attribute on the Schema, and matching this to a custom Attribute which is applied to a wrapper. The wrapper retrieves a string from a project Resources file representing the schema (there will be a check to ensure that the version of the schema and the version specified by the attribute matches). The first thing that main() does is build a lookup table of schemas to use, indexed by version, using Reflection to examine the available version wrapper types. This sounds like an overengineered way of doing things, but I'm trying to think ahead and build redundancy and flexibility in by using several arbitrary steps into which new functionality could be inserted. Possible improvements include implementing a custom resource manager type to sidestep some of the Heath-Robinson functionality described here.

+2  A: 

Many file formats, XML and otherwise, put some thought toward forward compatibility. Even the bitmap format has elements in the header that define how large the header is so that new bitmap formats can be defined with a different header structure.

I would recommend defining at least some invariant rules about your file format. A version indicator could be a namespace, as you suggested, a file extension, or even just an element in a known position in the document.

If you can say "There will always be a <version> element here, regardless of schema, which I can use to determine which version of the schema to use when validating...", then the problem is solved. The point is to have something you can depend on to determine the version, regardless of what else might change.

kbrimington