views:

607

answers:

3

I'm trying to create XML using the ElementTree object structure in python. It all works very well except when it comes to processing instructions. I can create a PI easily using the factory function ProcessingInstruction(), but it doesn't get added into the elementtree. I can add it manually, but I can't figure out how to add it above the root element where PI's are normally placed. Anyone know how to do this? I know of plenty of alternative methods of doing it, but it seems that this must be built in somewhere that I just can't find.

+1  A: 

Yeah, I don't believe it's possible, sorry. ElementTree provides a simpler interface to (non-namespaced) element-centric XML processing than DOM, but the price for that is that it doesn't support the whole XML infoset.

There is no apparent way to represent the content that lives outside the root element (comments, PIs, the doctype and the XML declaration), and these are also discarded at parse time. (Aside: this appears to include any default attributes specified in the DTD internal subset, which makes ElementTree strictly-speaking a non-compliant XML processor.)

You can probably work around it by subclassing or monkey-patching the Python native ElementTree implementation's write() method to call _write on your extra PIs before _writeing the _root, but it could be a bit fragile.

If you need support for the full XML infoset, probably best stick with DOM.

bobince
A: 

I don't know much about ElementTree. But it is possible that you might be able to solve your problem using a library I wrote called "xe".

xe is a set of Python classes designed to make it easy to create structured XML. I haven't worked on it in a long time, for various reasons, but I'd be willing to help you if you have questions about it, or need bugs fixed.

It has the bare bones of support for things like processing instructions, and with a little bit of work I think it could do what you need. (When I started adding processing instructions, I didn't really understand them, and I didn't have any need for them, so the code is sort of half-baked.)

Take a look and see if it seems useful.

http://home.avvanta.com/~steveha/xe.html

Here's an example of using it:

import xe
doc = xe.XMLDoc()

prefs = xe.NestElement("prefs")
prefs.user_name = xe.TextElement("user_name")
prefs.paper = xe.NestElement("paper")
prefs.paper.width = xe.IntElement("width")
prefs.paper.height = xe.IntElement("height")

doc.root_element = prefs


prefs.user_name = "John Doe"
prefs.paper.width = 8
prefs.paper.height = 10

c = xe.Comment("this is a comment")
doc.top.append(c)

If you ran the above code and then ran print doc here is what you would get:

<?xml version="1.0" encoding="utf-8"?>
<!-- this is a comment -->
<prefs>
    <user_name>John Doe</user_name>
    <paper>
     <width>8</width>
     <height>10</height>
    </paper>
</prefs>

If you are interested in this but need some help, just let me know.

Good luck with your project.

steveha
+3  A: 

Try the lxml library: it follows the ElementTree api, plus adds a lot of extra's. From the compatibility overview:

ElementTree ignores comments and processing instructions when parsing XML, while etree will read them in and treat them as Comment or ProcessingInstruction elements respectively. This is especially visible where comments are found inside text content, which is then split by the Comment element.

You can disable this behaviour by passing the boolean remove_comments and/or remove_pis keyword arguments to the parser you use. For convenience and to support portable code, you can also use the etree.ETCompatXMLParser instead of the default etree.XMLParser. It tries to provide a default setup that is as close to the ElementTree parser as possible.

Not in the stdlib, I know, but in my experience the best bet when you need stuff that the standard ElementTree doesn't provide.