ansaurus

Question

Answer 1

+4 A:

It's not possible AFAIK... which is a pity. Basically, ElementTree modules assume that the reader is 100% XML compliant, so it shouldn't matter if they output a section as CDATA or some other format that generates the equivalent text.

See this thread on the Python mailing list for more info. Basically, they recommend some kind of DOM-based XML library instead.

Dan 2008-10-06 16:21:58

bortzmeyer 2008-10-15 12:13:55

That's true, but some data can be dumped and parsed much more efficiently in CDATA format. So it's a pain to not be able to tell an XML library to handle it in this way.

Dan 2008-10-15 21:02:26

Answer 2

+5 A:

After a bit of work, I found the answer myself. Looking at the ElementTree.py source code, I found there was special handling of XML comments and preprocessing instructions. What they do is create a factory function for the special element type that uses a special (non-string) tag value to differentiate it from regular elements.

def Comment(text=None):
    element = Element(Comment)
    element.text = text
    return element

Then in the _write function of ElementTree that actually outputs the XML, there's a special case handling for comments:

if tag is Comment:
    file.write("<!-- %s -->" % _escape_cdata(node.text, encoding))

In order to support CDATA sections, I create a factory function called CDATA, extended the ElementTree class and changed the _write function to handle the CDATA elements.

This still doesn't help if you want to parse an XML with CDATA sections and then output it again with the CDATA sections, but it at least allows you to create XMLs with CDATA sections programmatically, which is what I needed to do.

The implementation seems to work with both ElementTree and cElementTree.

import elementtree.ElementTree as etree
#~ import cElementTree as etree

def CDATA(text=None):
    element = etree.Element(CDATA)
    element.text = text
    return element

class ElementTreeCDATA(etree.ElementTree):
    def _write(self, file, node, encoding, namespaces):
        if node.tag is CDATA:
            text = node.text.encode(encoding)
            file.write("\n<![CDATA[%s]]>\n" % text)
        else:
            etree.ElementTree._write(self, file, node, encoding, namespaces)

if __name__ == "__main__":
    import sys

    text = """
    <?xml version='1.0' encoding='utf-8'?>
    <text>
    This is just some sample text.
    </text>
    """

    e = etree.Element("data")
    cdata = CDATA(text)
    e.append(cdata)
    et = ElementTreeCDATA(e)
    et.write(sys.stdout, "utf-8")

gooli 2008-10-06 16:41:48

Answer 3

A:

lxml has support for CDATA and API like ElementTree.

iny 2008-10-14 17:43:57

Answer 4

A:

2008-11-26 14:28:21

Answer 5

A:

The DOM has (atleast in level 2) an interface DATASection, and an operation Document::createCDATASection. They are extension interfaces, supported only if an implementation supports the "xml" feature.

from xml.dom import minidom

my_xmldoc=minidom.parse(xmlfile)

my_xmldoc.createCDATASection(data)

now u have cadata node add it wherever u want....

2009-02-04 06:53:24

ansaurus

tags:

views:

answers:

How to output CDATA using ElementTree

related questions