Anyone know what the best practices are or have general advice around having HTML/XHTML content within an XML element? Is it best to use CDATA or to just HTML encode the HTML?
I would recommend CDATA; it will make the XML smaller and more easily human-readable.
However, make sure to escape ]]>
as ]]>]]<![CDATA[
.
EDIT: As other people have said, if you control the HTML that you're embedding, and you know that it will always be valid XHTML, then you should nest it directly without escaping.
However, if you don't control the HTML, I might not recommend that. Even if it's valid now, it might one day become invalid, and you do not want your system to suddenly break because of that. Obviously, this depends on the circumstances and the use case; if you want a more precise recommendation, please give us more detail.
Third option: Having HTML normally embedded in XML is far more flexible than encoding it, or embedding it with CDATA. It allows parsers to handle the entire document including the HTML in a high-level way. It allows use of XSL transformations on both the containing XML and the HTML data.
So I'd suggest directly embedding it unless your HTML is not valid XML, in which case encoding or CDATA would be the only option anyway.
I'd go with namespaced XHTML directly in the document (as opposed to "as a string", which is what the two options you propose offer).
If you don't do that, then it makes no difference which of the you use,
Since a lot of HTML is incorrectly formed as XML (i.e., missing end tags like </p>, </li>, and <br/>), it may be less work to simply use a CDATA wrapper.
It depends on where you're getting the HTML from. If you're generating it yourself, you have total control over its form, but if you're pulling it from some other source (e.g., extracting it from some other web site) you probably don't have the luxury of reformatting it to be XHTML compliant.