ansaurus

Question

Is there any Python XML parser that was designed with humans in mind?

Answer 1

+17 A:

You should take a look at ElementTree. It's not doing exactly what you want but it's a lot better then minidom. If I remember correctly, starting from python 2.4, it's included in the standard libraries. For more speed use cElementTree. For more more speed (and more features) you can use lxml (check the objectify API for your needs/approach).

I should add that BeautifulSoup do partly what you want. There's also Amara that have this approach.

Etienne 2009-09-29 17:35:42

+1, ElementTree is excellent.

Mark 2009-09-29 17:41:50

ElementTree is excellent.

Andrew Sledge 2009-09-29 18:28:02

Agreed, ElementTree is super-easy to use. Not so great with fancy-namespaces (yet) but getting better all the time. Avoid minidom when possible.

Salim Fadhley 2009-09-29 20:04:12

Answer 2

+2 A:

I actually wrote a library that does things exactly the way you imagined it. The library is called "xe" and you can get it from: http://home.avvanta.com/~steveha/xe.html

xe can import XML to let you work with the data in an object-oriented way. It actually uses xml.dom.minidom to do the parsing, but then it walks over the resulting tree and packs the data into xe objects.

EDIT: Okay, I went ahead and implemented your example in xe, so you can see how it works. Here are classes to implement the XML you showed:

import xe

class Node(xe.TextElement):
    def __init__(self, text="", value=None):
        xe.TextElement.__init__(self, "node", text)
        if value is not None:
            self.attrs["value"] = value

class Root(xe.NestElement):
    def __init__(self):
        xe.NestElement.__init__(self, "root")
        self.node = Node()

And here is an example of using the above. I put your sample XML into a file called "example.xml", but you could also just put it into a string and pass the string.

>>> root = Root()
>>> print root
<root/>
>>> root.import_xml("example.xml")
<Root object at 0xb7e0c52c>
>>> print root
<root>
    <node value="30">text</node>
</root>
>>> print root.node.attrs["value"]
30
>>>

Note that in this example, the type of "value" will be a string. If you really need attributes of another type, that's possible too with a little bit of work, but I didn't bother for this example. (If you look at PyFeed, there is a class for OPML that has an attribute that isn't text.)

steveha 2009-09-29 18:32:14

ansaurus

tags:

views:

answers:

Is there any Python XML parser that was designed with humans in mind?

related questions