tags:

views:

1287

answers:

10
+1  A: 

CDATA for simplicity.

Mohamed
A: 

Encoding it will work fine and is reliable. You can encode encoded sections etc. without any difficulty.

Decoding will be done automatically by whatever XML parser is used to handle your encoded HTML.

Brian Agnew
A: 

i think the answer depends on what you are planning to do with the html content, and also what type of html content you plan to support.

Especially when it comes to included javascript, encoding often results in problems. CDATA definitely helps you there.

If you plan to use only small snippets (ie. a paragraph) and have a way to preprocess/filter it (because oyu dont want javascript or fancy things anyways), you will probably be better off with encoding or actually just putting it directly as subtree in the xml. You can then also post-process the html (ie filter style or onclick attributes). But this is definitely more work.

Niko
+1  A: 

I don't know what XML builder you're using, but PHP (actually libxml) knows how to handle ]]> inside CDATA sections, and so should every other XML framework. So, I'd use a CDATA section.

Ionuț G. Stan
A: 
Xinus
A: 

It makes sense to wrap HTML in CDATA. The HTML text will probably constitute on single value in XML.

So not wrapping it in CDATA will cause all xml parsers to read it as a part of the XML document. While it is easy to circumvent this problem while using the xml, why the extra headache?

If you want to actually parse the HTML into a DOM, then its better to read the HTML text, and setup a parser to read the test separately.

Hope that came out the way I intended it to.

Here Be Wolves
+3  A: 

CDATA is easier to read by eye while encoded content can have end of CDATA markers in it safely — but you don't have to care. Just use an XML library and stop worrying about it. Then all you have to say is "Put this text inside this element" and the library will either encode it or wrap it in CDATA markers.

David Dorward
A: 

Personally, I hate CDATA segments, so I'd use encoding instead. Of course, if you add XML to XML to XML then this would result in encoding over encoding over encoding and thus some very unreadable results. Why I hate CDATA segments? I wish I knew. Personal preference, mostly. I just don't like getting used to adding "forbidden characters" inside a special segment where they would suddenly be allowed again. It just confuses me when I see XML mark-up within a CDATA segment and it's not part of the XML surrounding it. At least with encoding I will see that it's encoded.

Good XML libraries will handle both encoding and CDATA segments transparently. It's just my eyes that get hurt.

Workshop Alex
+1  A: 
Ned Batchelder
A: 

If your HTML is well-formed, then just embed the HTML tags without escaping or wrapping in CDTATA. If at all possible, it helps to keep your content in XML. It gives you more flexibility for transforming and manipulating the document.

You could set a namespace for the HTML, so that you could disambiguate your HTML tags from the other XML wrapping it.

Escaped text means that the entire HTML block will be one big text node. Wrapping in CDATA tells the XML parser not to parse that section. It may be "easier", but limits your abilities downrange and should only be employed when appropriate; not just because it is more convenient. Escaped markup is considered harmful.

Mads Hansen