views:

209

answers:

7

Hello

I am trying to write XML data using Stax where the content itself is HTML

If I try

xtw.writeStartElement("contents");
xtw.writeCharacters("<b>here</b>");
xtw.writeEndElement();

I get this

<contents>&lt;b&gt;here&lt;/b&gt;</contents>

Then I notice the CDATA method and change my code to:

xtw.writeStartElement("contents");
xtw.writeCData("<b>here</b>");
xtw.writeEndElement();

and this time the result is

<contents><![CDATA[<b>here</b>]]></contents>

which is still not good. What I really want is

<contents><b>here</b></contents>

So is there an XML API/Library that allows me to write raw text without being in a CDATA section? So far I have looked at Stax and JDom and they do not seem to offer this.

In the end I might resort to good old StringBuilder but this would not be elegant.

Update:

I agree mostly with the answers so far. However instead of <b>here</b> I could have a 1MB HTML document that I want to embed in a bigger XML document. What you suggest means that I have to parse this HTML document in order to understand its structure. I would like to avoid this if possible.

Answer:

It is not possible, otherwise you could create invalid XML documents.

A: 

The problem is not "here", it's <b></b>.

Add the <b> element as a child of contents and you'll be able to do it. Any library like JDOM or DOM4J will allow you to do this. The general case is to parse the content into an XML DOM and add the root element as a child of <contents>.

You can't add escaped values outside of a CDATA section.

duffymo
+2  A: 

The issue is that is not raw text it is an element so you should be writing

xtw.writeStartElement("contents");
xtw.writeStartElement("b");
xtw.writeCData("here");
xtw.writeEndElement();
xtw.writeEndElement();
Mark
I think the problem is that the he has a blob which MAY contain tags.
ShiDoiSi
A: 

If your XML and HTML are not too big, you could make a workaround:

xtw.writeStartElement("contents");
xtw.writeCharacters("anUniqueIdentifierForReplace"); // <--
xtw.writeEndElement();

When you have your XML as a String:

xmlAsString.replace("anUniqueIdentifierForReplace", yourHtmlAsString);

I know, it's not so nice, but this could work.


Edit: Of course, you should check if yourHtmlAsString is valid.

Daniel Engmann
Very clever! Thank you for this idea.
kazanaki
This is actually a very unclever hack. If you don't want the XML writer to produce a valid XML document, use String concatenation to begin with instead.
jarnbjo
If you know that you have valid XML to enter as a blob this would work but you are taking a risk that it is all well formed.
Mark
Ok! Ok! I won't use this. No need to downvote Daniel any more.
kazanaki
Any computer program obeys the "Garbage In Garbage Out" rule. This solution is no worse. Either you have valid input, and then this solution is more efficient as the others, or you don't, in which case all solutions proposed here fail to produce valid XML output. So, this solution is strictly better.
MSalters
A: 

If you want to embed a large HTML document in an XML document then CDATA imho is the way to go. That way you don't have to understand or process the internal structure and you can later change the document type from HTML to something else without much hassle. Also I think you can't embed e.g. DOCTYPE instructions directly (i.e. as structured data that retains the semantics of the DOCTYPE instruction). They have to be represented as characters.

(This is primarily a response to your update but alas I don't have enough rep to comment...............)

musiKk
+1  A: 

If you want the XML to be included AS XML and not as character data, then it has to be parsed at some point. If you don't want to manually do the parsing yourself, you have two alternatives:

(1) Use external parsed entities -- in this case the external file will be pulled in and parsed by the XML parser. When the output is again serialized, it will include the contents of the external file.

[ See http://www.javacommerce.com/displaypage.jsp?name=entities.sql&amp;id=18238 ]

(2) Use Xinclude -- in that case the file has to be run thru an xinclude processor which will merge the xinclude references into the output. Most xslt processors, as well as xmllint will also do xinclude with an appropriate option.

[ See: http://www.xml.com/pub/a/2002/07/31/xinclude.html ]

( XSLT can also be used to merge documents without using the XInclude syntax. XInclude just provides a standard syntax )

Steven D. Majewski
A: 

I don't see what the problem is with parsing the large block of XML you want to insert into your output. Use a StAX parser to parse it, and just write code to forward all of the events to your existing serializer (variable "xtw").

Paul Clapham
A: 

If the blob of html is actually xhtml then I'd suggest doing something like (in pseudo-code):

xtw.writeStartElement("contents")
XMLReader  xtr=new XMLReader();
xtr.read(blob);
Dom dom=xtr.getDom();
for(element e:dom){
    xtw.writeElement(e);
}
xtw.writeEndElement();

or something like that. I had to do something similar once but used a different library.

Joel Bushart