views:

70

answers:

2

For interaction with a customer's application we use XML documents. That is, we send an XML over HTTP and receive a response XML document the same way. The customer specified two XML schemata that describe the format of both the request and reply. All was working fine, until one day the customer started to use CDATA sections in the response XML. We set up our parser unmindful of CDATA sections, so we failed to interpret them.

My question is: Who made a mistake here? I tried to find an answer in the XML standards, but I'm still not sure. I think I cannot prescribe using or not using CDATA's in an XSD, is that right? If so, is it not enough to agree upon an XSD, but a separate agreement has to be made about CDATA sections? Or one is obliged to be prepared to parse CDATA and regular text as well?

I'm interested in both your personal views and any official statements too. Thank you!

+5  A: 

CDATA is a basic part of XML. Failing to support it means the parser is broken (and not a real XML parser (which would be able to cope with elements containing text, CDATA, entities, other elements, comments, etc, etc, etc)).

Since I mentioned it in a comment on another answer, I now have an urge to repeat it here. Not expecting CDATA in XML is like not expecting fish in the sea.

David Dorward
The parser could have handled the CDATA just fine, just provided a different way to get regular content and CDATA content. We only asked it for the regular one.
kicsit
When parsing a document (and usually when generating one) you shouldn't have to care *how* the structure is expressed. It should be handled invisibly by the parser. Making users of the parser write things like `$foo = $bar->has_cdata ? decode_cdata($bar->cdata_content) : $bar->non_cdata_content;` is just dumb. It also becomes unusable since you can drop in and out mode CDATA mode at will (so part of the element content is in a CDATA block and part isn't).
David Dorward
LOL - Point taken! :)
kicsit
+5  A: 

Many XML parsers separate out text and CDATA, which is unfortunate. The mistake was yours: there is no semantic difference between regular text chunks and CDATA, so the sender should be free to choose between them based on the needs of the text at hand.

The good news is it should be a simple matter to adapt your code.

Ned Batchelder
The unfortunate XML parser interface makes it partially the fault of the parser's authors.
Thilo
The only two choices for fault were sender and receiver. Parser vendor isn't allowed!
Ned Batchelder
It IS the OP's fault, but Thilo makes a good point. If there's not a semantic difference between CDATA and PCDATA, then parsers don't need to tell you which one a text element is. But some do anyway.
dan04