ansaurus

Question

ElementTree in Python 2.6.2 Processing Instructions support?

Answer 1

+1 A:

Yeah, I don't believe it's possible, sorry. ElementTree provides a simpler interface to (non-namespaced) element-centric XML processing than DOM, but the price for that is that it doesn't support the whole XML infoset.

There is no apparent way to represent the content that lives outside the root element (comments, PIs, the doctype and the XML declaration), and these are also discarded at parse time. (Aside: this appears to include any default attributes specified in the DTD internal subset, which makes ElementTree strictly-speaking a non-compliant XML processor.)

You can probably work around it by subclassing or monkey-patching the Python native ElementTree implementation's write() method to call _write on your extra PIs before _writeing the _root, but it could be a bit fragile.

If you need support for the full XML infoset, probably best stick with DOM.

bobince 2009-09-29 00:52:37

Answer 2

A:

I don't know much about ElementTree. But it is possible that you might be able to solve your problem using a library I wrote called "xe".

xe is a set of Python classes designed to make it easy to create structured XML. I haven't worked on it in a long time, for various reasons, but I'd be willing to help you if you have questions about it, or need bugs fixed.

It has the bare bones of support for things like processing instructions, and with a little bit of work I think it could do what you need. (When I started adding processing instructions, I didn't really understand them, and I didn't have any need for them, so the code is sort of half-baked.)

Take a look and see if it seems useful.

http://home.avvanta.com/~steveha/xe.html

Here's an example of using it:

import xe
doc = xe.XMLDoc()

prefs = xe.NestElement("prefs")
prefs.user_name = xe.TextElement("user_name")
prefs.paper = xe.NestElement("paper")
prefs.paper.width = xe.IntElement("width")
prefs.paper.height = xe.IntElement("height")

doc.root_element = prefs


prefs.user_name = "John Doe"
prefs.paper.width = 8
prefs.paper.height = 10

c = xe.Comment("this is a comment")
doc.top.append(c)

If you ran the above code and then ran print doc here is what you would get:

<?xml version="1.0" encoding="utf-8"?>
<!-- this is a comment -->
<prefs>
    <user_name>John Doe</user_name>
    <paper>
     <width>8</width>
     <height>10</height>
    </paper>
</prefs>

If you are interested in this but need some help, just let me know.

Good luck with your project.

steveha 2009-09-29 04:35:44

Answer 3

+3 A:

Try the lxml library: it follows the ElementTree api, plus adds a lot of extra's. From the compatibility overview:

ElementTree ignores comments and processing instructions when parsing XML, while etree will read them in and treat them as Comment or ProcessingInstruction elements respectively. This is especially visible where comments are found inside text content, which is then split by the Comment element.

You can disable this behaviour by passing the boolean remove_comments and/or remove_pis keyword arguments to the parser you use. For convenience and to support portable code, you can also use the etree.ETCompatXMLParser instead of the default etree.XMLParser. It tries to provide a default setup that is as close to the ElementTree parser as possible.

Not in the stdlib, I know, but in my experience the best bet when you need stuff that the standard ElementTree doesn't provide.

2009-09-29 21:15:50

ansaurus

tags:

views:

answers:

ElementTree in Python 2.6.2 Processing Instructions support?

related questions