views:

40

answers:

2

Hi all,

I am thinking about rewriting a schema with lots of standalone complex types in it, into one where the complex types extend other base types more sensibly. The rationale for this is partly conceptual - because most of these types are specific instances of a domain object with a definite hierarchical structure - and partly practical, because we're using JAXB-generated classes to handle the XML reading logic and it's impossible to write methods for common functionality without either reflection or a lot of instanceof and casting. Bleh.

So my primary question is whether anyone is aware of a good way to test two XSD schemas for functional equivalence? If I perform this schema refactoring correctly, the set of documents considered valid should be exactly the same for the two schemas despite the fact the files themselves would be very different. This sounds like the kind of thing that a testing framework could help with; I know there are tools that will suggest test inputs for JUnit tests, and I was wondering whether there might be any tools to generate edge case XML documents to test for validity against the old and new schemata?

And as an aside - if this is a terrible idea (or if there are better alternatives), then stop me now. :-)

Thanks for your attention.

+1  A: 

After some investigation, I've come across the Liquid XML Sample Generator, which sounds like exactly the kind of thing I'd be looking for (though I'm not sure quite how good this would be for testing edge cases).

I also stumbled across SUT: XML Schema Unit Test which does not look to be actively maintained any more, or well documented, but takes a more programmatic approach to defining what should be valid or invalid XML documents for a given schema, and testing this. Unfortunately test cases seem to be interpreted linearly rather than permuting the possibilities for each distinct node as the above XML sample generator would do. Consequently while it would be possible to enforce better coverage it would likely be a lot of work to do so.

So at least I have what appears to be a workable fallback - generating a bunch of documents from the old schema and then validating them against the new schema, and vice versa. It's not definitive (well, unit tests never are anyway) but running a few hundred or thousand tests in each direction should give some pretty good confidence that the schemata are equivalent.

I would still be very happy to have an elegant testing solution presented, but after an hour of Googling I'm not holding out much hope that something along those lines exists.

Andrzej Doyle
Ultimately given that there don't seem to be any silver bullets, I used a corpus of XML documents composed of some captured from production, and a lot of those generated by Liquid XML. While this did not (and could not, as xcut notes) give 100% confidence it certainly went a long way.
Andrzej Doyle
+1  A: 

Hi Andrzej; there is no general answer to this problem, as the question of whether two context free grammars are equivalent is undecidable. See this wikipedia page.

There has been some theoretical work done on this, but there is no tool that is packaged off the shelf to do this for you.

You can solve this problem if you make certain assumptions about how to design your schema (forbidding anonymous types, naming conventions and so on). Then you can use a library like eclipse XSD (which works outside eclipse in a standalone app just fine) to perform the comparison.

Finally, here's a link to a research paper that discusses the problem in more detail.

xcut
Thanks for an interesting and informative answer that deserves more than the one upvote I can give it. In particular it's interesting to know that it's an undecidable problem - which makes empirical testing with a large sample more acceptable.
Andrzej Doyle