views:

209

answers:

3

Can i get for example the node structure or something like this from the validator? Something like a listener or an handler. The exception is not enough. I have to select the node where the error occured. Thats what i build so far.

def factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
def schema = factory.newSchema(new StreamSource(new FileReader("src/import.xsd")))
def validator = schema.newValidator()
try {
 validator.validate(new StreamSource(new FileReader("src/import.xml")))
 println "everything is fine"
} catch(SAXException e) { 
 println e
}

Thank you.

+1  A: 

Validate when you parse. Here's the code in Java, looks like the translation to Groovy should be pretty straightforward:

InputStream xml = // …
InputStream xsd = // …

SchemaFactory xsFact = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = xsFact.newSchema(new StreamSource(xsd));

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
dbf.setSchema(schema);
DocumentBuilder db = dbf.newDocumentBuilder();

Document dom = db.parse(new InputSource(xml));
kdgregory
Okay, now i have the dom of the XML, but how do i know where the error was? I mean something like:cvc-length-valid: Value 'aaaaaa' with length = '6' is not facet-valid with respect to length '5' for type 'stringLength5'.
codedevour
SAXException tracks the line and column number of the error, so it relates back to the original file, not the element. After responding, I checked the sources to see if you could get more information by casting to the implementation type, but that does not appear to be the case. All I can say is I feel your pain; it's a definite oversight in the API.
kdgregory
And in case it wasn't clear: the 'db.parse()' call should be wrapped in a try/catch.
kdgregory
it was clear i wrapped it in try/catch :) sad that this isn't possible - thank you anytime for your effort.
codedevour
A: 

You have to use the SAXParseException to get more errors or the SAX locator if you're implementing ContentHandler, and possibly a lexer. The exception will give you details about the error, line number.

try{
  ...
}
catch(SAXParseException e){
   int lineNumber = e.getLineNumber();
   int columnNumber = e.getColumnNumber();
   String message = e.getMessage();
   // do something
}
catch(SAXException e){
   // what should we do?
   // if we're implementing ContentHandler 
   // we can use the org.xml.sax.Locator to get more info
}

Usually the column information from the Locator returns -1. For offset precision, you'll have to either use an extended ContentHandler or a lexer:

  • Get the line number of the error
  • Estimate the position of the node with the line information, attributes(start tag, end tag) using a lexer or regular expressions or something else.
John Doe
A: 

Depending on how much control you have over the environment, there is a somewhat clunky way to do this. The Xerxes 2 XML parser, which is a drop-in replacement for the default parser, has a property on the Validator to get the current node, so if you keep a reference to the Validator (as a field of an ErrorHandler that you set on the Validator, for example) you can get the node structure. Here's how I did it in Java:

...
  Validator validator = schema.newValidator();
  validator.setErrorHandler(new MyErrorHandler(validator));
...



public class MyErrorHandler implements ErrorHandler {
  private Validator validator;

  public AnnotatingErrorHandler(Validator v) {
    super();
    validator = v;
  }

  @Override
  public void error(SAXParseException e) throws SAXException {

    try {
      element = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");
    } catch (SAXNotRecognizedException e) {
      log(Level.SEVERE, "Xerxes 2 XML parser is required", saxnre);
    } catch (SAXNotSupportedException e) {
   ; // shouldn't happen in this context
    }
    ... // do stuff
  }
  ...
}
Jim E-H