views:

2466

answers:

4

I'm trying to validate an XML file against a number of different schemas (apologies for the contrived example):

  • a.xsd
  • b.xsd
  • c.xsd

c.xsd in particular imports b.xsd and b.xsd imports a.xsd, using:

<xs:include schemaLocation="b.xsd"/>

I'm trying to do this via Xerces in the following manner:

XMLSchemaFactory xmlSchemaFactory = new XMLSchemaFactory();
Schema schema = xmlSchemaFactory.newSchema(new StreamSource[] { new StreamSource(this.getClass().getResourceAsStream("a.xsd"), "a.xsd"),
                                                         new StreamSource(this.getClass().getResourceAsStream("b.xsd"), "b.xsd"),
                                                         new StreamSource(this.getClass().getResourceAsStream("c.xsd"), "c.xsd")});     
Validator validator = schema.newValidator();
validator.validate(new StreamSource(new StringReader(xmlContent)));

but this is failing to import all three of the schemas correctly resulting in cannot resolve the name 'blah' to a(n) 'group' component.

I've validated this successfully using Python, but having real problems with Java 6.0 and Xerces 2.8.1. Can anybody suggest what's going wrong here, or an easier approach to validate my XML documents?

+2  A: 

The schema stuff in Xerces is (a) very, very pedantic, and (b) gives utterly useless error messages when it doesn't like what it finds. It's a frustrating combination.

The schema stuff in python may be a lot more forgiving, and was letting small errors in the schema go past unreported.

Now if, as you say, c.xsd includes b.xsd, and b.xsd includes a.xsd, then there's no need to load all three into the schema factory. Not only is it unnecessary, it will likely confuse Xerces and result in errors, so this may be your problem. Just pass c.xsd to the factory, and let it resolve b.xsd and a.xsd itself, which it should do relative to c.xsd.

skaffman
Yeah this seems to result in the same error too. I'm wondering whether the import declarations in the schema files are causing issues... It doesn't help that two of the schemas have no target namespace either... gargh
Jon
Maybe one of the ways to resolve this is to use a ResourceResolevr and set it on the schema factory...
Jon
Are you sure you're not mixing up import and include? They mean two different things, and shouldn't be confused. Are a, b and c in different namespaces? If so, then they should be imported, not included. If they're in the same namespace, they should be included.
skaffman
I've not written the schema as such nor can i change them, include is used - they are in different namespaces - not quite sure why. I had to write a custom resolver and import the root schema to get this to work in the end... but thanks for the pointer on loading the root schema anyways...
Jon
+4  A: 

So just in case anybody else runs into the same issue here, I needed to load a parent schema (and implicit child schemas) from a unit test - as a resource - to validate an XML String. I used the Xerces XMLSchemFactory to do this along with the Java 6 validator.

In order to load the child schema's correctly via an include I had to write a custom resource resolver. Code can be found here:

http://pbin.oogly.co.uk/listings/viewlistingdetail/2a70d763929ce3053085bfaa1d78e2

To use the resolver specify it on the schema factory:

xmlSchemaFactory.setResourceResolver(new ResourceResolver());

and it will use it to resolve your resources via the classpath (in my case from src/main/resources). Any comments are welcome on this...

Jon
Any chance of elaborating on this a bit further as to how the custom resource resolver makes this all work? Thanks.
Casey
A: 

I'm having a similar issue. Does implementing a ResourceResolver solve the following known bug: https://issues.apache.org/jira/browse/XERCESJ-1130

If not, is there a workaround? I can't modify the XSDs.

About this particular solution (link mentioned above), when i try to use it, the following line results in null which later throws an exception... i might just be missing something here.. any ideas?

InputStream resourceAsStream = this.getClass().getResourceAsStream(systemId);

JJC
A: 

http://www.kdgregory.com/index.php?page=xml.parsing section 'Multiple schemas for a single document'

My solution based on that document:

URL xsdUrlA = this.getClass().getResource("a.xsd");
URL xsdUrlB = this.getClass().getResource("b.xsd");
URL xsdUrlC = this.getClass().getResource("c.xsd");

SchemaFactory schemaFactory = schemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
//---
String W3C_XSD_TOP_ELEMENT =
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n"
   + "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" elementFormDefault=\"qualified\">\n"
   + "<xs:include schemaLocation=\"" +xsdUrlA.getPath() +"\"/>\n"
   + "<xs:include schemaLocation=\"" +xsdUrlB.getPath() +"\"/>\n"
   + "<xs:include schemaLocation=\"" +xsdUrlC.getPath() +"\"/>\n"
   +"</xs:schema>";
Schema schema = schemaFactory.newSchema(new StreamSource(new StringReader(W3C_XSD_TOP_ELEMENT), "xsdTop"));
iolha