views:

84

answers:

2

Given an XML document like this:

 <!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'&gt;
 <author>john</author>
 <doc>
   <title>&title;</title>
 </doc>

I wanted to parse the above XML document and generate a copy of it with all of its entities already resolved. So given the above XMl document, the parser should output:

 <!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'&gt;
 <author>john</author>
 <doc>
   <title>Stack Overflow Madness</title>
 </doc>

I know that you could implement an org.xml.sax.EntityResolver to resolve entities, but what I don't know is how to properly generate a copy of the XML document with everything still intact (except its entities). By everything, I mean the whitespaces, the dtd at the top of the document, the comments, and any other things except the entities that should have been resolved previously. If this is not possible, please suggest a way that at least can preserve most of the things (e.g. all but no comments).

Note also that I am restricted to the pure Java API provided by Sun, so no third party libraries can be used here.

Thanks very much!

EDIT: The above XML document is a much simplified version of its original document. The original one involves a very complex entity resolution using EntityResolver whose significance I have greatly reduced in this question. What I am really interested is how to produce an exact copy of the XML document with an XML parser that uses EntityResolver to resolve the entities.

+1  A: 

Is it possible for you to read in the xml template as a string? And with the string do something like

string s = "<title>&title;</title>";
s = s.replace("&title;", "Stack Overflow Madness");
SaveXml(s);
Unfortunately, I can't. The entity resolution is much more complex than simply replacing with something else. So I will have to use org.xml.sax.EntityResolver to do it.
+1  A: 

You almost certainly cannot do this using any XML parser I've heard of, and certainly the Sun XML parsers cannot do it. They will happily discard details that have no significance as far as the meaning of the XML is concerned. For example,

<title>Stack Overflow Madness</title>

and

<title >Stack Overflow Madness</title >

are indistinguishable from the perspective of the XML syntax, and the Sun parsers (rightly) treat them as identical.

I think your choices are to do the replacement treating the XML as text (as @Wololo suggests) or relax your requirements.

By the way, you can probably use an XmlEntityResolver independently of the XML parser. Or create a class that does the same thing. This may mean that String.replace... is not the answer, but you should be able to implement an ad-hoc expander that iterates over the characters in a character buffer, expanding them into a second one.

Stephen C
Big +1 on this. Perhaps if you (OP) were to explain **why** you need to preserve exact XML, someone would be able to suggest a better approach.
ChssPly76
There is no reason really. It would just be nicer if it were possible, and I didn't know if it were/weren't possible with a Java XML parser, so I asked.