views:

203

answers:

1

The Facts

In my java application I have to handle XML files with different schema versions (xsd files) simultaneously. The content of the XML files changed only a little between the different versions, so I'd like to use mainly the same code to handle it and just do some case distictions dependent on the version of the used schema.

Current Solution

Right now I'm parsing the XML files with a SAX parser and my own ContentHandler ignoring the schema version and just checking if the tags I need for processing are present.

Possible Alternative

I'd really like to use JAXB to generate the classes for parsing the XML files. This way I could remove all the hardcoded strings (constants) from my java code and handle with the generated classes instead.

Question(s)

  • How can I handle different schema versions in a unified way using JAXB?
  • Is there a better solution?


Progress

I compiled the schema versions to different packages v1, v2 and v3. Now I can create an Unmarshaller this way:

JAXBContext jc = JAXBContext.newInstance( 
    v1.Root.class, v2.Root.class, v3.Root.class );
Unmarshaller u = jc.createUnmarshaller();

Now u.unmarshal( xmlInputStream ); gives me the Root class from the package matching the schema of the XML file.

Next I'll try to define an interface to access the common parts of the schemas. If you have done something like this before, please let me know. In the mean time I'm reading through the JAXB specs...

+1  A: 

First, you need some way to identify the schema appropriate for the particular instance document. You say that the documents have a schemaLocation attribute, so this is one solution. Note, however, that you have to specifically configure the parser to use this attribute, and a malicious document could specify a schema location that you don't control. Instead, I'd recommend getting the attribute value, and using it to find the appropriate schema in an internal table.

Next is access to the data. You don't say why you're using three different schemas. The only rational reason is an evolving data spec (ie, the schemas represent versions 1, 2, and 3 of the same data). If that's not your reason, then you need to rethink your design.

If you are trying to support an evolving data spec, then you need to answer the question "how do I deal with data that's missing." There are a couple of answers to this: one is to maintain multiple versions of the code. With refactoring of common functionality, this is not a bad idea, but it can easily become unmaintainable.

The alternative is to use a single codebase, and some sort of adapter object that incorporates your rules. And if you go down this path, JAXB is the wrong solution, since it is tied to a schema. You might be able to use a permissive XML->Java converter: I believe XStream will work, and I know that the 1.1 release of Practical XML will work (since I wrote it) -- although you'd have to build it yourself.

Another, better alternative, depending on the complexity of the schema, is to develop a set of objects that use XPath to retrieve the data. I would probably implement using a "master" object that contains XPath expressions for every field, in every variant of the schema. Then create lightweight "wrapper" objects that hold a DOM version of your instance document, and use the XPath appropriate to the schema. Note, however, that this is limited tor read-only access.

kdgregory
+1: Thanks for you answer. I'll have a look at your suggestions. And yes, the different versions are evolving data specs.
tangens