views:

595

answers:

2

I'm using the StAX event based API's to modify an XML stream. The stream represents an HTML document, complete with DTD declaration. I would like to copy this DTD declaration into the output document (written using an XMLEventWriter). When I ask the factory to disregard DTD's it will not download the DTD, but remove the whole statement and only leave a "<!DOCUMENTTYPE" string. When not disregarding, the whole DTD gets downloaded, and included when verbatim outputting the DTD event. I don't want to use the time to download this DTD, but include the complete DTD specification (resolving entities is already disabled and I don't need that). Does anyone know how to disable the fetching of external DTD's.

+3  A: 

You should be able to implement a custom XMLResolver that redirects attempts to fetch external DTDs to a local resource (if your code parses only a specific doc type, this is often a class resource right in a JAR).

class CustomResolver implements javax.xml.stream.XMLResolver {

  public Object resolveEntity(String publicID,
                              String systemID,
                              String baseURI,
                              String namespace)
                  throws XMLStreamException 
  {
    if ("The public ID you expect".equals(publicID)) {
      return getClass().getResourceAsStream("doc.dtd");
    } else {
      return null;
    }
  }

Note that some documents only include the "systemID", so you should fall back to checking that. The problem with system identifier is that it's supposed to be "system" specific URL, rather than a well-known, stable URI. In practice, it's often used as if it were a URI though.

See the setXMLResolver method.

erickson
Setting this works only when I use woodstox. I was trying to use the sun provider.
Paul de Vrieze
+1  A: 

Also: your original approach (setting SUPPORT_DTD to false) might work with Woodstox, if so far you have been using the default Sun stax parser bundled with JDK 1.6.

StaxMan