views:

66

answers:

1

I need to validate XML files against a wide array of constraints: type and/or format of element's text, co-occurrences, date comparisons and date math, as well as some user defined rules from a database (i.e. element X can only contain child elements A, B, and C) and I am not sure how to go about it.

The current incarnation of this application enforces these constrains via Perl, through I feel that the Perl code duplicates a lot of the functionality that I would get out-the-box using XSD or RELAX NG. Unfortunately using either of these would result in error messages that would be cryptic to the end users. Mapping these cryptic message to something friendly seems impossible (other than providing line/col numbers).

Then there's Schematron. It allows me to generate friendly messages and check constraints that the aforementioned schema langs can't. Unfortunately, type/format checking and date math become big XSLT template hacks.

Now I'm not sure what to do.

A combination between Schematron and, say, RELAX NG, seems to be the best approach, yet the errors generated by RELAX NG make it impossible to provide anything informative to the end user.

I had hoped to use a master schema template that would be modified based on the custom rules in the DB.

Is keeping the original approach the best, or should I move forward using Schematron/RELAX NG and wrestling the format/type enforcement into XSLT templates?

A: 

On a practical note, it seems to me that your task really isn't one of XML validation, and shouldn't probably be forced into one.

Your description reads like you have lot of domain specific and perhaps even circumstance specific validation of the data contained within the XML documents, not really the XML documents per se. As such, I'd say, write a validator in code that parses the documents and applies your complex, data driven, validation suite. I suspect you'll be able to give much better feedback to the users since your code will have semantic knowledge of the domain.

I suppose you could use and XML schema system as the "first pass" of such a system, but if, in the end you need to parse and load the data anyway, I've usually found the schema validation adds nothing since the code that parses essentially has to validate anyway.

MtnViewMark
The domain is the XML document. Well, there are user defined rules, but they account for a tiny fraction of the overall requirements.Validation is used to enforce some elaborate XML specification that I have no control over. If the given instance doc passes, it's sent out never to be heard from again. If it fails, it's corrected and sent. The rules serve no other purpose.
Perhaps we don't understand your situation enough. "date comparisons and date math" and "some user defined rules from a database" neither sound like good candidates for standard XML validation languages.I understand now that you don't need to load the data for yet some other use. So, I think the two pass approach might be best: Use XML validation (either what ever schema seems easiest), and then parse the doc and perform the data checks (like date math).
MtnViewMark