views:

504

answers:

3

I have the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<details>
  ...
  <address1>Test&amp;Address</address1>
  ...
</details>

When I try to unmarshal it using JAXB, it throws the following exception:

Caused by: org.xml.sax.SAXParseException: The reference to entity "Address" must end with the ';' delimiter.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)

But when I changed the &amp; in the XML to &apos;, it works. Looks like the problem is only with ampersand &amp; and I cannot understand why.

The code to unmarshal is:

JAXBContext context = JAXBContext.newInstance("some.package.name", this.getClass().getClassLoader());
Unmarshaller unmarshaller = context.createUnmarshaller();
obj = unmarshaller.unmarshal(new StringReader(xml));

Anyone have some insight?

EDIT: I tried the solution suggested by @abhin4v below (ie, add a space after &amp;), but it doesn't seem to work too. Here's the stacktrace:

Caused by: org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)
+2  A: 

Xerces converts &amp; to & and then tries to resolve &Address which fails because it does not end with ;. Put a space between & and Address and it should work. Putting a space will not work as Xerces will now try to resolve & and throw the second error given in OP. You can wrap the test in a CDATA section and Xerces will not try to resolve the entities.

abhin4v
Doesn't work :(See my edit in the question.
ryanprayogo
+2  A: 

I've run into this too. First pass I simply replaced the &amp to a token string (AMPERSAND_TOKEN), sent it through JAXB, then re-replaced the ampersand. Not ideal, but it was a quick fix.

Second pass I made a lot of significant changes, so I'm not sure what exactly solved the problem. I suspect that providing JAXB access to the html dtds made it much happier, but that's only a guess and could be specific to my project.

HTH

Quotidian
A: 

It turns out that the problem is because of the framework I'm using (Mentawai framework). The said XML comes from the POST body of an HTTP request.

Apparently, the framework converts the character entities in the XML body, therefore, &amp; becomes & and the unmarshaller fails to unmarshal the XML.

ryanprayogo
Urgh, that's a pretty dumb thing for it to do. Doesn't exactly inspire confidence in the rest of the framework.
skaffman
Yeah, unfortunately it's the choice of the company to use this particular framework. I can only complain :(
ryanprayogo
That has absolutely nothing to do with the framework being used. Mentawai does not perform any kind o conversion on the HTTP level. It passes the POST parameters as is.
Sergio Oliveira Jr.