views:

588

answers:

5

I'd like to parse a simple, small XML file using python however work on pyXML seems to have ceased. I'd like to use python 2.6 if possible. Can anyone recommend an XML parser that will work with 2.6?

Thanks

+11  A: 

If it's small and simple then just use the standard library:

from xml.dom.minidom import parse
doc = parse("filename.xml")

This will return a DOM tree implementing the standard Document Object Model API

If you later need to do complex things like schema validation or XPath querying then I recommend the third-party lxml module, which is a wrapper around the popular libxml2 C library.

Eli Courtwright
Thanks! Far too easy.
Alex
+3  A: 

Would lxml suit your needs? Its the first tool I turn to for xml parsing.

Il-Bhima
Additionally, Python 2.5+ has etree integrated. It implements what amounts to a subset of lxml. I use etree for simple XML processing and lxml when I need anything that etree doesn't quite cover.
Jason R. Coombs
+2  A: 

Here is also a very good example on how to use minidom along with explanations.

Andrei Vajna II
+1  A: 

A few years ago, I wrote a library for working with structured XML. It makes XML simpler by making some limiting assumptions.

You could use XML for something like a word processor document, in which case you have a complicated soup of stuff with XML tags embedded all over the place; in which case my library would not be good.

But if you are using XML for something like a config file, my library is rather convenient. You define classes that describe the structure of the XML you want, and once you have the classes done, there is a method to slurp in XML and parse it. The actual parsing is done by xml.dom.minidom, but then my library extracts the data and puts it in the classes.

The best part: you can declare a "Collection" type that will be a Python list with zero or more other XML elements inside it. This is great for things like Atom or RSS feeds (which was the original reason I designed the library).

Here's the URL: http://home.avvanta.com/~steveha/xe.html

I'd be happy to answer questions if you have any.

steveha
+3  A: 

For most of my tasks I have used the Minidom Lightweight DOM implementation, from the official page:

from xml.dom.minidom import parse, parseString

dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name

datasource = open('c:\\temp\\mydata.xml')
dom2 = parse(datasource)   # parse an open file

dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')
Toxinide