views:

119

answers:

3

I'm copying code from one part of our application (an applet) to inside the app. I'm parsing XML as a String. It's been awhile since I parsed XML, but from the error that's thrown it looks like it might have to do with not finding the .dtd. The stack trace makes it difficult to find the exact cause of the error, but here's the message:

java.net.MalformedURLException: no protocol: <a href="http://www.mycomp.com/MyComp.dtd"&gt;http://www.mycomp.com/MyComp.dtd&lt;/a&gt;

and the XML has this as the first couple lines:

<?xml version='1.0'?>
<!DOCTYPE MYTHING  SYSTEM '<a href="http://www.mycomp.com/MyComp.dtd"&gt;http://www.mycomp.com/MyComp.dtd&lt;/a&gt;'&gt;

and here's the relevant code snippets

class XMLImportParser extends DefaultHandler {

  private SAXParser m_SaxParser = null;
  private String is_InputString = "";

  XMLImportParser(String xmlStr) throws SAXException, IOException {
    super();
    is_InputString = xmlStr;
    createParser();
    try {
      preparseString();
      parseString(is_InputString);
    } catch (Exception e) {
       throw new SAXException(e); //"Import Error : "+e.getMessage());
    }
  }

  void createParser() throws SAXException {
    SAXParserFactory factory = SAXParserFactory.newInstance();
    factory.setValidating(true);
    try {
        factory.setFeature("http://xml.org/sax/features/namespaces", true);
        factory.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
        m_SaxParser = factory.newSAXParser();
        m_SaxParser.getXMLReader().setFeature("http://xml.org/sax/features/namespaces", true);
        m_SaxParser.getXMLReader().setFeature("http://xml.org/sax/features/namespace-prefixes", true);
    } catch (SAXNotRecognizedException snre){
        throw new SAXException("Failed to create XML parser");  
    } catch (SAXNotSupportedException snse) {
        throw new SAXException("Failed to create XML parser");  
    } catch (Exception ex) {
        throw new SAXException(ex);  
    }
  }

  void preparseString() throws SAXException {
    try {
        InputSource lSource = new InputSource(new StringReader(is_InputString));
        lSource.setEncoding("UTF-8");
        m_SaxParser.parse(lSource, this);
    } catch (Exception ex) {
        throw new SAXException(ex);
    }
  }

}

It looks like the error is happening in the preparseString() method, on the line that actually does the parsing, the m_SaxParser.parse(lSource, this); line.

FYI, the 'MyComp.dtd' file does exist at that location and is accessible via http. The XML file comes from a different service on the server, so I can't change it to a file:// format and put the .dtd file on the classpath.

+4  A: 

I think you have some extra code in the XML declaration. Try this:

<?xml version='1.0'?>
<!DOCTYPE MYTHING  SYSTEM "http://www.mycomp.com/MyComp.dtd"&gt;

The above was captured from the W3C Recommendations: http://www.w3.org/QA/2002/04/valid-dtd-list.html

You can use the http link to set the Schema on the SAXParserFactory before creating your parser.

void createParser() throws SAXException {
    Schema schema = SchemaFactory.newSchema(new URL("http://www.mycomp.com/MyComp.dtd"));
    SAXParserFactory factory = SAXParserFactory.newInstance();
    factory.setValidating(true);
    factory.setSchema(schema);
John Engelman
Thanks for the quick response. I think this is the more complete answer with the complete corrected DOCTYPE tag. Please see my question above about the possibility of ignoring this DOCTYPE tag since I'm getting it from an external source.
thanks for the quick answers
You can set the Schema on the SAXParserFactory to one created from the http link. I'll post an edit to the answer above.
John Engelman
+4  A: 

The problem is that this:

<a href="http://www.mycomp.com/MyComp.dtd"&gt;http://www.mycomp.com/MyComp.dtd&lt;/a&gt;

is an HTML hyperlink, not a URL. Replace it with this:

http://www.mycomp.com/MyComp.dtd
Stephen C
Thanks, that did it. I had a local copy of the Xml and changed it. However, when running live, I can't modify this DOCTYPE line because I'm getting it from an external service. So is there any way to tell the parser to ignore it? I see references to validating/non-validating in other parts of the code that make me wonder if non-validating would make it ignore the bad dtd reference.
Setting your parser to non-validating will cause the entire DTD to be ignored. The reason for having the DTD is so that the parser can validate the input XML against it.
John Engelman
@codeman73 - you should try to get whatever is giving you that DOCTYPE fixed. It is clearly bogus.
Stephen C
@John, disabling validation will not stop the parser from reading the DTD. The DTD can also define entities and attribute default values that are needed to parse the XML.
Jörn Horstmann
+1  A: 

Since this XML comes from an external source, the first thing to do would be to complain to them that they are sending invalid XML.

As a workaround, you can set an EntityResolver on your parser that compares the SystemId to this invalid url and returns a correct http url:

m_SaxParser.getXMLReader().setEntityResolver(
    new EntityResolver() {
        public InputSource resolveEntity(final String publicId, final String systemId) throws SAXException {
            if ("<a href=\"http://www.mycomp.com/MyComp.dtd\"&gt;http://www.mycomp.com/MyComp.dtd&lt;/a&gt;".equals(systemId)) {
                return new InputSource("http://www.mycomp.com/MyComp.dtd");
            } else {
                return null;
            }
        }
    }
);
Jörn Horstmann