views:

870

answers:

2

I like Python, but I don't want to write 10 lines just to get an attribute from an element. Maybe it's just me, but minidom isn't that mini. The code I have to write in order to parse something using it looks a lot like Java code.

Is there something that is more user-friendly ? Something with overloaded operators, and which maps elements to objects?

I'd like to be able to access this :


<root>
<node value="30">text</node>
</root>

as something like this :


obj = parse(xml_string)
print obj.node.value

and not using getChildren or some other methods like that.

+17  A: 

You should take a look at ElementTree. It's not doing exactly what you want but it's a lot better then minidom. If I remember correctly, starting from python 2.4, it's included in the standard libraries. For more speed use cElementTree. For more more speed (and more features) you can use lxml (check the objectify API for your needs/approach).

I should add that BeautifulSoup do partly what you want. There's also Amara that have this approach.

Etienne
+1, ElementTree is excellent.
Mark
ElementTree is excellent.
Andrew Sledge
Agreed, ElementTree is super-easy to use. Not so great with fancy-namespaces (yet) but getting better all the time. Avoid minidom when possible.
Salim Fadhley
+2  A: 

I actually wrote a library that does things exactly the way you imagined it. The library is called "xe" and you can get it from: http://home.avvanta.com/~steveha/xe.html

xe can import XML to let you work with the data in an object-oriented way. It actually uses xml.dom.minidom to do the parsing, but then it walks over the resulting tree and packs the data into xe objects.

EDIT: Okay, I went ahead and implemented your example in xe, so you can see how it works. Here are classes to implement the XML you showed:

import xe

class Node(xe.TextElement):
    def __init__(self, text="", value=None):
        xe.TextElement.__init__(self, "node", text)
        if value is not None:
            self.attrs["value"] = value

class Root(xe.NestElement):
    def __init__(self):
        xe.NestElement.__init__(self, "root")
        self.node = Node()

And here is an example of using the above. I put your sample XML into a file called "example.xml", but you could also just put it into a string and pass the string.

>>> root = Root()
>>> print root
<root/>
>>> root.import_xml("example.xml")
<Root object at 0xb7e0c52c>
>>> print root
<root>
    <node value="30">text</node>
</root>
>>> print root.node.attrs["value"]
30
>>>

Note that in this example, the type of "value" will be a string. If you really need attributes of another type, that's possible too with a little bit of work, but I didn't bother for this example. (If you look at PyFeed, there is a class for OPML that has an attribute that isn't text.)

steveha