views:

128

answers:

3

My input is a well-formed XML document and a corresponding XML Schema document. What I would like to do is determine the location within the XML document that causes it to fail validation against the XML Schema document. I could not figure out how to do this using the standard validation approach in Java:

SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(... /* the .xsd source */);
Validator validator = schema.newValidator();
DocumentBuilderFactory ...
DocumentBuilder ...
Document document = DocumentBuilder.parse(... /* the .xml source */);
try {
    validator.validate(new DOMSource(document));
    ...
} catch (SAXParseException e) {
    ...
}

I have toyed with the idea of getting at least the line and column number from SAXParseException, but they're always set to -1, -1 on validation error.

A: 

You won't get schema validation errors reported this way. If you really want line and column information, you need to set an error handler, i.e.

try {
   validator.setErrorHandler(handler);
   validator.validate(...);
} catch (SAXParseException e) {
   // Use handler info, or log it in handler
}

Here's the interface you need to implement: ErrorHandler

xcut
I've also tried that. It too returns -1 for row and column: error exception: cvc-complex-type.3.2.2: Attribute 'xsi:noNamespaceSchemaLocation' is not allowed to appear in element 'shiporder'., line: -1, column: -1 error exception: cvc-datatype-valid.1.2.1: 'asdf9.90' is not a valid value for 'decimal'., line: -1, column: -1 error exception: cvc-type.3.1.3: The value 'asdf9.90' of element 'price' is not valid., line: -1, column: -1
You mean the error() and warning() methods in the ErrorHandler are passed -1 in the exception info? That's strange, should not be happening.. perhaps try a different SAX parser implementation?
xcut
You're right, with DOM they are not, with SAX they are. With the line and column numbers though, I have the problem of working backwards through a character stream to find the node or attribute that failed. Also, the column number seems to be the end of the close tag and I don't know that that could always be the case with every parser implementation.
I remember something about having to pass xmlOptions.setLoadLineNumbers() to the XMLParser to make it work with XMLBeans. Maybe you need to do that with the SAX parser too.
Stroboskop
A: 

Take a look at the https://jaxb2-commons.dev.java.net/xpath-tracker/ project. It was designed for Jaxb, but I think it would work outside of this since it is an XMLFilter.

Dave
+1  A: 

A DOM does not retain information about its source -- in most cases it's irrelevant, and DOM is meant to be manipulated (ie, any location information would be incorrect).

The solution is to validate at the time you parse: call DocumentBuilderFactory.setSchema() before creating the DocumentBuilder.

Anon