tags:

views:

680

answers:

4

I have a weird requirement where I need to take some xml and re-write it so that the text nodes are wrapped in CDATA (this is for a client that won't allow normal escaping).

It doesn't seem like any of the normal XML libraries dom4j, jdom, java xml, have any built in support for this. Any ideas? Can I use XSLT for this?

I wasn't very clear. Here is what I'll start with:

<foo>This has an &amp; escaped value</foo>

What I need to do is convert this to:

<foo><![CDATA[This has an & escaped value]]></foo>

-Dave

A: 

Taking premade xml and parsing (with an xml parser) it is just going to make the parser choke on the unescaped characters. The only solution I can think of is to make your own tag soup parser to parse it, modify and dump it back to xml.

ewanm89
+1  A: 

I think it could work with an XSLT transformation, but I am not sure regarding the performance of the transformation. Take a look to CDATA Sections and XSLT, it may help you.

Daniel H.
This might work, but yeah, I'll need check out the performance. Thanks!
Dave
+2  A: 

You can use XSLT to accomplish this, as long as a) all of the text you need to output is in elements, b) you only care about text nodes, c) you know the names of all the elements that contain text, and d) it's okay to emit any text in all of those output elements as CDATA. If all of those cases are true, then you could write an identity transform and add this element to it:

<xsl:output method="xml" cdata-section-elements="elm1 elm2 elm3..."/>

See the W3C XSLT recommendation on this subject.

Robert Rossney
+1  A: 

Thanks for all of your answers. I found a way to do this using dom4j. My implementation does not work if elements have "mixed" children (i.e. text element), but in my case this isn't a problem. It works because dom4j will output CDATA if you add CDATA nodes:

    public void replaceTextWithCdataNoMixedText(Document doc) {
        if( doc == null )
            return;
        replaceTextWithCdata(doc.content());
    }

    private void replaceTextWithCdata(List content) {
        if (content == null)
            return;
        for (Object o : content) {
            if (o instanceof Element) {
                Element e = (Element) o;
                String t = e.getTextTrim();
                if (textNeedsEscaping(t)) {
                    e.clearContent();
                    e.addCDATA(t);
                } else {
                    List childContent = e.content();
                    replaceTextWithCdata(childContent);
                }
            }
        }
    }


    private boolean textNeedsEscaping(String t) {
        if (t == null)
            return false;
        for (int i = 0; i < t.length(); i++) {
            char c = t.charAt(i);
            if (c == '<' || c == '>' || c == '&') {
                return true;
            }
        }
        return false;
    }
Dave