tags:

views:

1365

answers:

6

I'm searching for an easy to handle python native module to create python object representation from xml.

I found several modules via google (one of them is XMLObject) but didn't want to try out all of them.

What do you think is the best way to do such things?

EDIT: I missed to mention that the XML I'd like to read is not generated by me. It's an existing XML file in a structure of which I have no control over.

+1  A: 

I've heard the easiest is ElementTree, though I rarely work with XML and I can't say anything from experience.

Milen A. Radev
+3  A: 

I second the suggestion of xml.etree.ElementTree, mostly because it's now in the stdlib. There is also a faster implementation, xml.etree.cElementTree available too.

If you really need performance, I would suggest lxml

http://www.ibm.com/developerworks//xml/library/x-hiperfparse/

JimB
+2  A: 

Python has pickle and cPickle modules for Python object serialization. Both of these modules provide functionality to serialize/deserialize Python object hierarchy to convert to/from a byte stream:

The following provides similar interface: pickle(), unpickle() for serialization to/from XML

I'm sorry .. I missed to mention that the XML I'd like to read is not generated by me. It's an existing XML file in a structure of which I have no control over.
Martin
A: 

I use (and like) PyRXP, which creates a tuple built from the XML document.

The main issue with a straight XML -> python object structure is that there is no python analog for a attributed list - that is, a list with elements, that also happens to have attributes. If you like, it is both a list and a dictionary at the same time.

I parse the result from PyRXP, and create the list/dictionary depending upon the structure - the XML I am dealing with is either list or attribute-based, never both. (I am consuming data from a known source).

Matthew Schinckel
PyRXP is not a native python module, or is it?
Martin
No, it's a 3rd party module, possibly written in C. I must have missed that part of the question.
Matthew Schinckel
+5  A: 

You say you want an object representation, which I would interpret to mean that nodes become objects, and the attributes and children of the node are represented as attributes of the object (possibly according to some Schema). This is what XMLObject does, I believe.

There are some packages that I know of. 4Suite includes some tools to do this, and I believe Amara specifically implements this (built on top of 4Suite). You can also use lxml.objectify, which was inspired by Amara and gnosis.xml.objectify.

Of course a third option is, given a concrete representation of the XML (using ElementTree or lxml) you can build your own custom model around that. lxml.html is an example of that, extending the base interface of lxml with some HTML-specific functionality.

ianb
Agree with your interpretation.
Ali A
Is 4Suite and lxml written in python? I didn't want to install binaries to solve such a trivial problem.
Martin
+1  A: 

There's also the excellent 3rd party library pyxser for Python.

pyxser stands for Python XML Serialization and is a Python object to XML serializer and deserializer. In other words, it can convert a Python object into XML and also, convert that XML back into the original Python object.

Kenny M.