views:

70

answers:

1

I have an XML data source that has HTML & CSS formatted data contained in one of the document nodes. What is the proper way to escape this data so that I can properly parse it? For clarification, I am using TouchXML in Objective-C to parse the data. (Not that it should matter but I wanted to include all pertinent information.)

Any help would be much appreciated. Thank you!

L.

+3  A: 

I'm assuming you have XML data that will have nodes containing HTML markup, not a mixture of XHTML and XML data intertwined in your document.

In this case, I generally prefer to use CDATA blocks for HTML data. This allows you to put any HTML you want. Trying to add HTML as regular nodes in an XML document can cause problems due to the fact that it is not always 100% XML compliant, as well as the fact that character entities would need to be defined in order to be properly recognized by an XML parser.

<xmlNode>
<![CDATA[

<Any>
    <Html>
        <Tags>
            <You>
                <Want />
            </You>
        </Tags>
    </Html>
</Any>

]]>
<xmlNode>
Dan Herbert
Laurence Gonsalves
@Laurence, while your point is valid, it is not very likely for the string "]]>" to appear, since CDATA is not common (to my knowledge) in HTML and it would be rare to include that sequence unescaped to display as regular output in a browser. It should be taken into consideration though, obviously.
Dan Herbert
Let's just say that I've seen a lot of code that takes a hand-wavy approach to escaping only to be bitten by it later when the "unlikely" ends up actually happening.
Laurence Gonsalves
You both have given me the information I need. I'll try encapsulating my HTML in "<![CDATA" blocks and see if that does the trick. Thank you both again.
Leachy Peachy