views:

1475

answers:

3

I'm trying to validate an Atom feed with Java 5 (JRE 1.5.0 update 11). The code I have works without problem in Java 6, but fails when running in Java 5 with a

org.xml.sax.SAXParseException: src-resolve: Cannot resolve the name 'xml:base' to a(n) 'attribute declaration' component.

I think I remember reading something about the version of Xerces bundled with Java 5 having some problems with some schemas, but i cant find the workaround. Is it a known problem ? Do I have some error in my code ?

public static void validate() throws SAXException, IOException {
    List<Source> schemas = new ArrayList<Source>();
    schemas.add(new StreamSource(AtomValidator.class.getResourceAsStream("/atom.xsd")));
    schemas.add(new StreamSource(AtomValidator.class.getResourceAsStream("/dc.xsd")));

    // Lookup a factory for the W3C XML Schema language
    SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

    // Compile the schemas.
    Schema schema = factory.newSchema(schemas.toArray(new Source[schemas.size()]));
    Validator validator = schema.newValidator();

    // load the file to validate
    Source source = new StreamSource(AtomValidator.class.getResourceAsStream("/sample-feed.xml"));

    // check the document
    validator.validate(source);
}

Update : I tried the method below, but I still have the same problem if I use Xerces 2.9.0. I also tried adding xml.xsd to the list of schemas (as xml:base is defined in xml.xsd) but this time I have

Exception in thread "main" org.xml.sax.SAXParseException: schema_reference.4: Failed to read schema document 'null', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.

Update 2: I tried to configure a proxy with the VM arguments -Dhttp.proxyHost=<proxy.host.com> -Dhttp.proxyPort=8080 and now it works. I'll try to post a "real answer" from home.

and sorry, I cant reply as a comment : because of security reasons XHR is disabled from work ...

+1  A: 

Indeed, people have been mentioning the Java 5 Sun provided SchemaFactory is giving troubles.

So: did you include Xerces in your project yourself?

After including Xerces, you need to ensure it is being used. If you like to hardcode it (well, as a minimal requirement you'd probably use some application properties file to enable and populate the following code):

String schemaFactoryProperty = 
  "javax.xml.validation.SchemaFactory:" + XMLConstants.W3C_XML_SCHEMA_NS_URI;

System.setProperty(schemaFactoryProperty,
   "org.apache.xerces.jaxp.validation.XMLSchemaFactory");

SchemaFactory factory = 
  SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

Or, if you don't want to hardcode, or when your troublesome code would be in some 3rd party library that you cannot change, set it on the java command line or environment options. For example (on one line of course):

set JAVA_OPTS = 
  "-Djavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema
  =org.apache.xerces.jaxp.validation.XMLSchemaFactory"

By the way: apart from the Sun included SchemaFactory implementation giving trouble (something like com.sun.org.apache.xerces.internal.jaxp.validation.xs.schemaFactoryImpl), it also seems that the "discovery" of non-JDK implementations fails in that version. If I understand correctly than, normally, just including Xerces would in fact make SchemaFactory#newInstance find that included library, and give it precedence over the Sun implementation. To my knowledge, that fails as well in Java 5, making the above configuration required.

Arjan
A: 

Hmmm, just read your "comment" -- editing does not alert people for new replies, so time to ask your boss for some iPhone or some other gadget that is connected to the net directly ;-)

Well, I assume you added:

schemas.add(
  new StreamSource(AtomValidator.class.getResourceAsStream("/xml.xsd")));

If so, is xml.xsd actually to be found on the classpath then? I wonder if the getResourceAsStream did not yield null in your case, and how new StreamSource(null) would act then.

Even if getResourceAsStream did not yield null, the resulting StreamSource would still not know where it was loaded from, which may be a problem when trying to include references. So, what if you use the constructor StreamSource(String systemId) instead:

schemas.add(new StreamSource(AtomValidator.class.getResource("/atom.xsd")));
schemas.add(new StreamSource(AtomValidator.class.getResource("/dc.xsd")));

You might also use StreamSource(InputStream inputStream, String systemId), but I don't see any advantage over the above two lines. However, the documentation explains why passing the systemId in either of the 2 constructors seems good:

This constructor allows the systemID to be set in addition to the input stream, which allows relative URIs to be processed.

Likewise, setSystemId(String systemId) explains a bit:

The system identifier is optional if there is a byte stream or a character stream, but it is still useful to provide one, since the application can use it to resolve relative URIs and can include it in error messages and warnings (the parser will attempt to open a connection to the URI only if there is no byte stream or character stream specified).

If this doesn't work out, then maybe some custom error handler can give you more details:

ErrorHandlerImpl errorHandler = new ErrorHandlerImpl();
validator.setErrorHandler(errorHandler);
:
:
validator.validate(source);

if(errorHandler.hasErrors()){
    LOG.error(errorHandler.getMessages());
    throw new [..];
}
if(errorHandler.hasWarnings()){
    LOG.warn(errorHandler.getMessages());
}

...using the following ErrorHandler to capture the validation errors and continue parsing as far as possible:

import org.xml.sax.helpers.DefaultHandler;
private class ErrorHandlerImpl extends DefaultHandler{
    private String messages = "";
    private boolean validationError = false;
    private boolean validationWarning = false;

    public void error(SAXParseException exception) throws SAXException{
        messages += "Error: " + exception.getMessage() + "\n";
        validationError = true;
    }

    public void fatalError(SAXParseException exception) throws SAXException{
        messages += "Fatal: " + exception.getMessage();
        validationError = true;
    }

    public void warning(SAXParseException exception) throws SAXException{
        messages += "Warn: " + exception.getMessage();
        validationWarning = true;
    }

    public boolean hasErrors(){
        return validationError;
    }

    public boolean hasWarnings(){
        return validationWarning;
    }

    public String getMessages(){
        return messages;
    }
}
Arjan
A: 

I tried to configure a proxy with the VM arguments -Dhttp.proxyHost=<proxy.host.com> -Dhttp.proxyPort=8080 and now it works.

Ah, I didn't realize that xml.xsd is in fact the one referenced as http://www.w3.org/2001/xml.xsd or something like that. That should teach us to always show some XML and XSD fragments as well. ;-)

So, am I correct to assume that 1.) to fix the Java 5 issue, you still needed to include Xerces and set the system property, and that 2.) you did not have xml.xsd available locally?

Before you found your solution, did you happen to try using getResource rather than getResourceAsStream, to see if the exception would then have showed you some more details?

If you actually did have xml.xsd available (so: if getResource did in fact yield a URL) then I wonder what Xerces was trying to fetch from the internet then. Or maybe you did not add that schema to the list prior to adding your own schemas? The order is important: dependencies must be added first.

For whoever gets tot his question using the search: maybe using a custom EntityResolver could have indicated the source of the problem as well (if only writing something to the log and just returning null to tell Xerces to use the default behavior).

Arjan