views:

699

answers:

1

So in my current project I use the JAXB RI with the default Java parser from Sun's JRE (which I believe is Xerces) to unmarshal arbitrary XML.

First I use XJC to compile an XSD of the following form:

<?xml version="1.0" encoding="utf-8" ?> 
<xs:schema attributeFormDefault="unqualified" 
elementFormDefault="qualified" 
xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt; 
<xs:element name="foobar">
...
</xs:element> 
</xs:schema>

In the "good case" everything works as designed. That is to say if I'm passed XML that conforms to this schema then JAXB correctly unmarshals it into an object tree.

The problem comes when I'm passed XML with an external DTD references, e.g.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foobar SYSTEM "http://blahblahblah/foobar.dtd"&gt;
<foobar></foobar>

Upon unmarshalling something like this, the SAX parser attempts to load the remote entity ("http://somehost/foobar.dtd") despite the fact that this snippet clearly does not conform to the schema I compiled earlier with XJC.

In order to circumvent this behavior, since I know that any conformant XML (according to the XSD I compiled) will never require the loading of a remote entity, I have to define a custom EntityResolver that short circuits the loading of all remote entities. So instead of doing something like:

MyClass foo = (MyClass) myJAXBContext.createUnmarshaller().unmarshal(myReader);

I'm forced to do this:

XMLReader myXMLReader = mySAXParser.getXMLReader();
myXMLReader.setEntityResolver(myCustomEntityResolver);
SAXSource mySAXSource = new SAXSource(myXMLReader, new InputSource(myReader));
MyClass foo = (MyClass) myJAXBContext.createUnmarshaller().unmarshal(mySAXSource);

So my ultimate question is:

When unmarshalling with JAXB, should the loading of remote entities by the underlying SAX parser be automatically short circuited when the XML in question can be recognized as invalid without the loading of those remote entities?

Also, doesn't this seem like a security issue? Given that JAX-WS relies on JAXB under the hood, it seems like I could pass specially crafted XML to any JAX-WS based web service and cause the WS host to load any arbitrary URL.

I'm a relative newbie to this, so there's probably something I'm missing. Please let me know if so!

+4  A: 

A well-crafted question, it deserves an answer :)

Some things to note:

  1. The JAXB runtime is not dependent on XML Schema. It uses a SAX parser to generate a stream of SAX events which it uses to bind on to the object model. This object model can be hand-written, or can be generated from a schema using XJC, but the binding and the runtime are very distinct from each other. So you may know that good XML input conforms to the schema at runtime, but JAXB does not.
  2. Forcing the runtime to load a remote DTD reference does not constitute a security hole. If there's a real DTD at the end of it, the worst case is that it won't validate. If it's not a real DTD, then it'll be ignored.
  3. DTD is considered obsolete, and so there's no direct support for it in the high level JAXB API. If you need an EntityResolver, you need to dig into the SAX API, which you have already done.
  4. If your class model was generated from an XML Schema, then you should consider validating against it at runtime, using SchemaFactory and Unmarshaller.setSchema(). This will instruct Xerces to validate the SAX events against the schema before being passed to JAXB. This won't stop the DTD being fetched, but it adds a layer of safety that you know the data is good.
skaffman
Re: #2:My concern stems from the fact that this is all going to be happening in a Servlet. I would not like a malicious client to be able to force my Servlet into attempting to load remote URLs. For instance, someone could cause my Servlet to DOS a third site. Or they could potentially DOS my server by submitting multiple requests, each of which causes the parser to attempt to load a (nonexistent) URL. It takes the parser ~30 seconds to throw a ConnectException when attempting to load a bogus URL.
Re: #4:Here's what I don't get. I compiled my XSD using XJC. Is it, in fact, true that no XML snippet containing a remote DTD reference will ever validate?If that's the case, then it seems like there should be some facility inside JAXB (or something lower down, i.e. the parser) to "fail fast" when a remote DTD reference is detected. Since we know the XML isn't going validate regardless of the DTD's contents, why do we need to load it?
Re #2: Like I said, you can use the `EntityResolver` to stop DTD lookups. Re #4: The presence of the DTD reference will not prevent validation against the schema.
skaffman
Ah, interesting. So that's where the confusion was. I had assumed that the DTD reference was bound to the snippet, meaning any snippet with a reference would be doomed to fail.Still...the loading of remote entity references seems like something many (most?) users would want to disable, if they knew a priori that any "good" XML they'll be asked to parse will not reference any external entities.The current way of achieving this is sort of...unwieldy. Would be nice if there were something like setLoadRemoteEntities(boolean) that would take care of it.