tags:

views:

1215

answers:

2
A: 

Resolving entities is not the job of JAXB's. It's the job of the underlying XML parser.

What you could do is:

  • read the data yourself using DOM
  • replace all unresolved entities by something you wish
  • then, let JAXB handle the result
ivan_ivanovich_ivanoff
Since Jaxb abstracts away from the XML parser, I think it should provide some means to fix things that need to be managed at the parser level (maybe just by using dependency injection to supply a different parser). Then again, I was giving JAXB invalid input -- so what did I expect? ;)
rcreswick
A: 

This is a hack, but it works in a pinch.

I downloaded the html entity definitions from w3.org, and set the doctype of the input xml file to xhtml-transitional, but directed the doctype url to a local dtd:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "xhtml1-transitional.dtd">

xhtml1-transitional.dtd, in turn, requires:

  • xhtml-lat1.ent
  • xhtml-special.ent
  • xhtml-symbol.ent

which I sucked down and put along side xhtml1-transitional.dtd

(All files are available at: http://www.w3.org/TR/xhtml1/DTD/ )

Like I said, ugly as hell, but it did seem to do the job.

rcreswick
Is there a reason you had to make the dtd document local? What happened with a remove DTD document?
Kathy Van Stone
w3.org returned an error code when JAXB tried to retrieve the DTD directly -- even though the url worked in a browser. I surmised that w3.org is blocking access to dtds/etc based on user agents to stop people from accessing their servers from APIs. (w3.org posted a plea for people to stop writing apps that do that a year or so ago: http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic ) I'm not sure what you mean by "what happened with a remove DTD document?".
rcreswick
You probably want to use an XML catalog to point at a local copy rather than changing the DOCTYPE itself. See http://xml.apache.org/commons/components/resolver/resolver-article.html for details.
Dominic Mitchell