Html inside XML. Should I use CDATA or encode the HTML

+1 A:

CDATA for simplicity.

Mohamed 2009-09-09 09:37:56

A:

Encoding it will work fine and is reliable. You can encode encoded sections etc. without any difficulty.

Decoding will be done automatically by whatever XML parser is used to handle your encoded HTML.

Brian Agnew 2009-09-09 09:38:44

A:

i think the answer depends on what you are planning to do with the html content, and also what type of html content you plan to support.

Especially when it comes to included javascript, encoding often results in problems. CDATA definitely helps you there.

If you plan to use only small snippets (ie. a paragraph) and have a way to preprocess/filter it (because oyu dont want javascript or fancy things anyways), you will probably be better off with encoding or actually just putting it directly as subtree in the xml. You can then also post-process the html (ie filter style or onclick attributes). But this is definitely more work.

Niko 2009-09-09 09:44:09

+1 A:

I don't know what XML builder you're using, but PHP (actually libxml) knows how to handle ]]> inside CDATA sections, and so should every other XML framework. So, I'd use a CDATA section.

Ionuț G. Stan 2009-09-09 09:44:13

A:

Xinus 2009-09-09 09:59:07

A:

It makes sense to wrap HTML in CDATA. The HTML text will probably constitute on single value in XML.

So not wrapping it in CDATA will cause all xml parsers to read it as a part of the XML document. While it is easy to circumvent this problem while using the xml, why the extra headache?

If you want to actually parse the HTML into a DOM, then its better to read the HTML text, and setup a parser to read the test separately.

Hope that came out the way I intended it to.

Here Be Wolves 2009-09-09 10:04:17

+3 A:

CDATA is easier to read by eye while encoded content can have end of CDATA markers in it safely — but you don't have to care. Just use an XML library and stop worrying about it. Then all you have to say is "Put this text inside this element" and the library will either encode it or wrap it in CDATA markers.

David Dorward 2009-09-09 10:15:04

A:

Personally, I hate CDATA segments, so I'd use encoding instead. Of course, if you add XML to XML to XML then this would result in encoding over encoding over encoding and thus some very unreadable results. Why I hate CDATA segments? I wish I knew. Personal preference, mostly. I just don't like getting used to adding "forbidden characters" inside a special segment where they would suddenly be allowed again. It just confuses me when I see XML mark-up within a CDATA segment and it's not part of the XML surrounding it. At least with encoding I will see that it's encoded.

Good XML libraries will handle both encoding and CDATA segments transparently. It's just my eyes that get hurt.

Workshop Alex 2009-09-09 10:56:37

+1 A:

Ned Batchelder 2009-09-09 10:59:41

A:

If your HTML is well-formed, then just embed the HTML tags without escaping or wrapping in CDTATA. If at all possible, it helps to keep your content in XML. It gives you more flexibility for transforming and manipulating the document.

You could set a namespace for the HTML, so that you could disambiguate your HTML tags from the other XML wrapping it.

Escaped text means that the entire HTML block will be one big text node. Wrapping in CDATA tells the XML parser not to parse that section. It may be "easier", but limits your abilities downrange and should only be employed when appropriate; not just because it is more convenient. Escaped markup is considered harmful.

Mads Hansen 2009-09-09 11:09:01

ansaurus

tags:

views:

answers:

Html inside XML. Should I use CDATA or encode the HTML

related questions