tags:

views:

569

answers:

3

I recently introduced HTML into some RSS feeds that I publish (which up to now only had plain text, no markup), and I was wondering which method is better: use character encoding (such as htmlspecialchars) or just encapsulate everything in CDATA?

It seems to me that CDATA might be easier, but I'm unclear as to whether there might be any reasons (subtle or otherwise) for choosing one approach versus the other. (For starters, the CDATA approach would be easier to read when viewing source...)

A: 

At the risk of giving you the answer you may not want to hear: use Atom instead of RSS.

Atom is nicely namespaced XML, so you can mix and match XHTML right in, without having to worry about the encoding issue you ask about.

It's pretty much supported everywhere RSS is, and because it's just vanilla Atom, it's easier to roll your own if you really don't want to use a library to manipulate it.

Atom is also a IETF standard, which RSS isn't.

jamesh
Not a direct answer...but thanks for the recommendation.
Novaktually
+2  A: 

Personally CDATA is easier, as it allows the display of the actual HTML by the subscriber without their reader needing to do anything funny.

If you use HTML Encoding, the subscribers reader, or website iteself, must decode the source to display the HTML

Mitchel Sellers
I can see how it'd be easier...but I'm curious if there are any other known advantages/disadvantages.
Novaktually
I see it as advantage/disadvantage that your consumers could display the HTML directly to the page via the CDATA method, depending on the purpose, it is both an advantage and disadvantage
Mitchel Sellers
Good point. I've had mixed results with various readers when the html characters are encoded -- some will render the HTML, others display the markup.
Novaktually
+2  A: 

CDATA is for any data that should not be parsed by the XML parser. Any tags not in a CDATA block will be parsed by the XML parser and may take on a different meaning.

CDATA can also incur an overhead for the parsers if there is no need for it. Try to avoid CDATA blocks any time you know HTML (or otherwise) won't be used, otherwise use it.

That said, I do agree with jamesh, in that you should always prefer Atom over RSS. I produce a feed reader and when scraping feeds, always prefer Atom over RSS.

Ryan McCue
Gah! Guess it's time for me to stop putting off reading up on Atom.
Novaktually