tags:

views:

165

answers:

1

Hi,

my XML is referencing a DTD like this:

< !DOCTYPE article PUBLIC "-//OWNER//NAME//EN" "http://invalid/path/to.dtd">

The DTD is not available via the given URL, but I can download it to my disc. I tried to implement a custom XmlResolver to load the DTD, but it does not work. My custom XmlResolver implements GetEntity and via the debugger I can see the following calls comming in:

  1. The requested uri is the xml document to be loaded. I open a stream for this document and return it. That works fine.
  2. The DTD is requested as a URI of the format "file:///absolut/path/to.xml/-//OWNER//NAME//EN". I'm using a regular expression to check for -//.*?// which works fine, but looks not very clean for me. But if the DTD is selfcontained, it works.
  3. The DTD is referencing modules.ent. That results in a call to GetEntity with an URI of: "file:///absolut/path/to.xml/-//OWNER//NAME//modules.ent". Obviously now it get's quite strange to reconstruct the intension of the path.

Any hint how to implement that in a correct way? I think public external DTDs are quite common in the publishing sector so there must be a clean solution!?

cheers, Achim

+1  A: 

"file:///absolut/path/to.xml/-//OWNER//NAME//EN" is a concatenation of the SYSTEM and the PUBLIC identifiers. Generally, you want to look at one or the other, not both, and certainly not as a single string. When you say "DTD is requested as a URI of the format," it is not clear who is doing the requesting. It appears that the calling code concatenates SYSTEM and PUBLIC.

If you have the DTD as a disk file and all you need to do is map one URI to another, you can override ResolveUri() instead of the full GetEntity(). GetEntity() is more useful if you have resources that are inaccessible as straightforward URIs, e.g. you compute the content of the resource at runtime, you fetch it from a database, you use a non-standard URL scheme and protocol like svn: etc.

iter