views:

1154

answers:

2

I am trying to convert a document with content like the following into another document, leaving the CDATA exactly as it was in the first document, but I haven't figured out how to preserve the CDATA with XSLT.

Initial XML:

<node>
    <subNode>
        <![CDATA[ HI THERE ]]>
    </subNode>
    <subNode>
        <![CDATA[ SOME TEXT ]]>
    </subNode>
</node>

Final XML:

<newDoc>
    <data>
        <text>
            <![CDATA[ HI THERE ]]>
        </text>
        <text>
            <![CDATA[ SOME TEXT ]]>
        </text>
    </data>
</newDoc>

I've tried something like this, but no luck, everything gets jumbled:

<xsl:element name="subNode">
    <xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:element>

Any ideas how to preserve the CDATA?

Thanks! Lance

Using ruby/nokogiri

Update: Here's something that works.

<text disable-output-escaping="yes">&lt;![CDATA[</text>
<value-of select="normalize-space(text())" disable-output-escaping="yes"/>
<text disable-output-escaping="yes">]]&gt;</text>

That will wrap all text() nodes in CDATA, which works for what I need, and it will preserve html tags inside the text.

+2  A: 

You cannot preserve the precise sequence of CDATA nodes if they're mixed with plain text nodes. At best, you can force all content of a particular element in the output to be CDATA, by listing that element name in xsl:output/@cdata-section-elements:

<xsl:output cdata-section-elements="text"/>
Pavel Minaev
Should I just use ruby and maybe regular expressions to preprocess them before I do the xslt, or something along those lines? How else would you do that? The cdata-section-elements isn't quite cutting it because I'm using variables and such.Thanks for the tip.
viatropos
If you absolutely need CDATA, then you'll have to look for something other than XSLT. That said, I'm very curious as to the reason why you need it. XDM doesn't distinguish between text and CDATA for a very good reason - no sane XML-processing application should ever give different semantics for them, so CDATA and character-escaping should be useable interchangeably.
Pavel Minaev
I am using this data in Flash, and I have heard there's lots of problems with CDATA/no CDATA. I haven't really tried yet tho :p
viatropos
A: 

Sorry to post an answer to my own question, but I found something that works:


<text disable-output-escaping="yes">&lt;![CDATA[</text>
<value-of select="normalize-space(text())" disable-output-escaping="yes"/>
<text disable-output-escaping="yes">]]&gt;</text>

That will wrap all text() nodes in CDATA, which works for what I need, and it will preserve html tags inside the text.

viatropos
I guess it's a way to get CDATA node specifically in the output (except that you can get `]]>` in input `text()`, in which case it won't quite do what you expect), but I don't see how this would let you preserve CDATA nodes that were there in the first place, since you still have no way of distinguishing input text nodes from input CDATA nodes. Otherwise, I don't see how this is any different than `cdata-section-elements`...
Pavel Minaev