views:

4403

answers:

6
A: 

Why would you need to?

Martin Beckett
I don't know; maybe if you had an xml embedded in an xml node. It's a contrived example, I know; and I have never had this problem, actually. I'm just curious to know if it's possible.
Juan Pablo Califano
Use case: You may want to enclose free-form documentation text inside a CDATA block if you want to include HTML elements in it (assuming your XML schema doesn't allow elements from the HTML namespace). Then suppose part of the text is explaining how CDATA blocks are opened and closed.
Ates Goral
You make a good point and it definitely looks like a valid use case. So concealing the CDATA ending is the way to go? Or maybe html-encoding it? (In case there's no other choice and that's valid within a CDATA section)
Juan Pablo Califano
It's probably not a valid use case. Free-form text containing HTML markup should probably be stored in a text node with the markup characters escaped - which will be done automatically by any DOM. Your text explaining CDATA would itself have its markup characters escaped.
Robert Rossney
+29  A: 

You have to break your data into pieces to conceal the ]]>.

Here's the whole thing:

<![CDATA[]]]]><![CDATA[>]]>

The first <![CDATA[]]]]> has the ]]. The second <![CDATA[>]]> has the >.

S.Lott
Thanks for your answer. I was rather looking for something like a backslash equivalent (within strings in C, PHP, Java, etc). According to the rule quoted by ddaa, it seems like there's not such a thing.
Juan Pablo Califano
+9  A: 
ddaa
Indeed. Well, I'm not an academic type but as I said in the question, I'm just curious about this. To be honest, I'll just take your word on this, because I can barely make sense out of the syntax used for the rule. Thanks for your answer.
Juan Pablo Califano
It reads like this:Char* (the set of all character sequences)- (except)Char* ']]>' Char* (the set of all character sequences that include the substring ']]>').
ddaa
Thanks for the extra clarification. I'm accepting your answer as the one that better addresses the question I asked. (S. Lott's answer provides a work-around, which is fine, although it doesn't specifically deal with an actual escape char or sequence.
Juan Pablo Califano
+4  A: 

S. Lott's answer is right: you don't encode the end tag, you break it across multiple CDATA sections.

How to run across this problem in the real world: using an XML editor to create an XML document that will be fed into a content-management system, try to write an article about CDATA sections. Your ordinary trick of embedding code samples in a CDATA section will fail you here. You can imagine how I learned this.

But under most circumstances, you won't encounter this, and here's why: if you want to store (say) the text of an XML document as the content of an XML element, you'll probably use a DOM method, e.g.:

XmlElement elm = doc.CreateElement("foo");
elm.InnerText = "<[CDATA[[Is this a problem?]]>";

And the DOM quite reasonably escapes the < and the >, which means that you haven't inadvertently embedded a CDATA section in your document.

Oh, and this is interesting:

XmlDocument doc = new XmlDocument();

XmlElement elm = doc.CreateElement("doc");
doc.AppendChild(elm);

string data = "<![[CDATA[This is an embedded CDATA section]]>";
XmlCDataSection cdata = doc.CreateCDataSection(data);
elm.AppendChild(cdata);

This is probably an ideosyncrasy of the .NET DOM, but that doesn't throw an exception. The exception gets thrown here:

Console.Write(doc.OuterXml);

I'd guess that what's happening under the hood is that the XmlDocument is using an XmlWriter produce its output, and the XmlWriter checks for well-formedness as it writes.

Robert Rossney
Well, I had an almost "real world" example. I usually load Xml from Flash that contains html markup within CDATA sections. Having a way to escape it could be useful, I guess. But anyway, in that case, the CDATA content is usually valid XHTML, and so the "outer" CDATA could be avoided altogether.
Juan Pablo Califano
CDATA can nearly always be avoided altogether. I find that people who struggle with CDATA very frequently don't understand what they're really trying to do and/or how the technology they're using really works.
Robert Rossney
Oh, I should also add that the only reason that the CMS I alluded to in my answer used CDATA was that I wrote it, and I didn't understand what I was really trying to do and/or how the technology works. I didn't need to use CDATA.
Robert Rossney
If you're using .net, the preceding comment about CDATA being avoidable is spot on - just write the content as a string and the framework will do all the escaping (and unescaping on read) for youfrom the real world....... xmlStream.WriteStartElement("UnprocessedHtml"); xmlStream.WriteString(UnprocessedHtml); xmlStream.WriteEndElement();
Mark Mullin
+3  A: 

Breaking the CDATA into two is the right solution. The problem is by no means academic. One of systems I am using is exporting XHTML templates to XML file and does not treat CDATA right (it was in tag). This means it was unable to import back its own backups without the trick. Thanks S. Lott.

macki
A: 

If anyone is looking for a solution on this, here's an (untested) function that should handle it in PHP.

Never mind, the website seems to edit my lol.

Randy