tags:

views:

90

answers:

6

I'm currently working on some old code that has the following construct.

Document doc = org.w3c.Document
Element root = doc.getDocumentElement();

if ( string contains \n or \r )
then
  root.appendChild(doc.createCDATASection(string))
else
  root.appendChild(doc.createTextNode(string))
endif

I can not think of any usage that would need to put a string a CDATA section when it contains an "\n" or an "\r". I believe using createTextNode will not cause any trimming or removal of newlines in the text in case string is like "mytext\n\n\n" when you either set it or retrieve the value.

Can somebody think of a valid/usefull case where you would want to put such a string in a CDATA section?

A: 

I could be way off base on this, but I seem to remember it being a good recommendation to put Javascript code inside CDATA tags. In fact see the selected answer for this stack overflow question as it does a decent job on answering why: http://stackoverflow.com/questions/66837/javascript-cdata-tags#66865

Jordan S. Jones
clearly not using javascript here so I don't see how that's relevant
Jonathan Fingland
Yup, now that you mention it, whenever I write XHTML I always put javascript and css in CDATA tags - it just makes your life easier when you need to use ampersands freely.
Elijah
It's not javascript, but basic text.
The original poster didn't define whether it was javascript or not. My answer simply dictates when it would be advisable to put a "string" with CR\LFs in a CDATA tag.
Jordan S. Jones
A: 

I know it sounds obvious, but if you are embedding a plain ascii text file and you want to preserve the manual formatting of the file verbatim. That would be a useful case.

Other cases that I have encountered are outputting metadata from images and I have no control over their formatting.

Elijah
+1  A: 

In XML, CDATA preserves whitespace, ordinary text does not.

anon
A: 

Putting text inside a CDATA section should ensure that any parser ignores it, so the code above might be used to ensure correct formatting regardless what a parser is told to do with whitespace.

I supposed that it effectively says that the line breaks are meaningful in that section, and not just incidental. Not sure why you would only put a CDATA section in if there were linebreaks present though, so I would guess it's just a workaround rather than a by-design thing in the code given.

Brabster
I suspect a work-around too, but unfortunatly, the code commit doesn't document why this is written the way it is (whats new :-) ). So I hoped that somebody would recognize a similar case.
A: 

I would say it depends entirely on whether your XML parse strips whitespace and control characters. I'm fairly certain the System.Xml ones in .NET don't, nor MSXML or Xerces but there are options to do it.

Chris S
Ok. In my testcase I used `root.getNodeValue()` to retrieve the string. So it depends on the xml implementation I use if I would get "mytext\n\n\n" back or "mytext"!? I hope this stripping of whitespace is not the default. I'll go check.
In .net, there are properites in the Xml readers that control whitespace preservation (e.g. XmlDocument.PreserveWhitespace)
Jason Williams
A: 

Since CDATA sections allow you to put arbitrary data inside an XML document without having to understand anything about how the XML objects are going to handle it, they're frequently used by people who don't understand how the XML objects work. Generally speaking, when I see someone creating CDATA in their XML I start from the assumption that they don't really know what they're doing unless they've included a good explanation. (And more often than not, that good explanation reveals that they didn't know what they were doing.)

The original developer is probably confusing the DOM's handling of text nodes that contain whitespace with its handling of text nodes that contain only whitespace. DOMs frequently normalize whitespace-only text nodes, which can be a problem in XML like:

<xsl:value-of select="foo"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="bar"/>

If the DOM normalizes the four spaces in that second element down to one space, that changes the functionality of that transform, which is an unambiguously bad thing.

But there's a reason you don't see XSLT that looks like this:

<xsl:value-of select="foo"/>
<xsl:text><![CDATA[    ]]>/xsl:text>
<xsl:value-of select="bar"/>

And that's that XSLT processors are written by people who understand how the XML objects work, and who know that in their specific case, it's important to tell the DOM to preserve whitespace in whitespace-only text nodes.

Robert Rossney