XML Processing in Python

tags:

xml
python

views:

1876

answers:

+13 Q:

XML Processing in Python

I'm about to build a piece of a project that will need to build and post an xml document to a web service, and I'd like to do it in Python as a means to expand my skills there. Unfortunately, while I know the XML model fairly well in .Net, I'm uncertain what the pros and cons are of the XML models in Python.

Anyone have experience doing XML processing in Python? Where would you suggest I start. The XML files I'll be building will be fairly simple.

Dive Into Python has a chapter. Can't vouch for how good it would be though.

Bernard 2008-08-02 03:43:33

+4 A:

Personally, I've played with several of the built-in options on an XML-heavy project and have settled on pulldom as the best choice for less complex documents.

Especially for small simple stuff, I like the event-driven theory of parsing rather than setting up a whole slew of callbacks for a relatively simple structure. Here is a good quick discussion of how to use the API.

What I like: you can handle the parsing in a for loop rather than using callbacks. You also delay full parsing (the "pull" part) and only get additional detail when you call expandNode(). This satisfies my general requirement for "responsible" efficiency without sacrificing ease of use and simplicity.

saint_groceon 2008-08-02 04:01:34

Isn't pulldom is a tool for parsing XML, not generating it (which is what the question asks about)?

cunkel 2008-09-20 01:07:39

+18 A:

ElementTree has a nice pythony API. I think it's even shipped as part of python 2.5

It's in pure python and as I say, pretty nice, but if you wind up needing more performance, then lxml exposes the same API and uses libxml2 under the hood. You can theoretically just swap it in when you discover you need it.

Gareth Simpson 2008-08-02 15:21:03

To complete your answer, can you add that lxml also support XML schema and XPath, which is not supported by ElementTree? And it's indeed shipped with Python 2.5.

Torsten Marek 2008-09-23 21:09:50

ElementTree is good until you need to deal with namespaces then it falls apart and it's unusable.

Ryu 2009-07-25 22:45:56

+1 A:

Since you mentioned that you'll be building "fairly simple" XML, the minidom module (part of the Python Standard Library) will likely suit your needs. If you have any experience with the DOM representation of XML, you should find the API quite straight forward.

Matt 2008-08-02 18:04:10

+1 A:

I write a SOAP server that receives XML requests, and creates XML responses. (Unfortunately, it's not my project, so it's closed source, but that's another problem).

It turned out for me that creating (SOAP) XML documents is fairly simple, if you have a data structure that "fits" the schema.

I keep the envelope, since the response envelope is (almost) the same as the request envelope. Then, since my data structure is a (possibly nested) dictionary, I create a string that turns this dictionary into <key>value</key> items.

This is a task that recursion makes simple, and I end up with the right structure. This is all done in python code, and is currently fast enough for production use.

You can also (relatively) easily build lists as well, although depending upon your client, you may hit problems unless you give length hints.

For me, this was much simpler, since a dictionary is a much easier way of working than some custom class. For the books, generating XML is much easier than parsing!

Matthew Schinckel 2008-08-03 08:34:57

+1 A:

I personally think that chapter from Dive into Python is great. Check that out first - it uses the minidom module and is a pretty good piece of writing.

Bartosz Radaczyński 2008-08-11 18:02:59

+1 A:

I recently started using Amara with success.

Norm 2008-08-11 22:40:27

+2 A:

There are 3 major ways of dealing with XML, in general: dom, sax, and xpath. The dom model is good if you can afford to load your entire xml file into memory at once, and you don't mind dealing with data structures, and you are looking at much/most of the model. The sax model is great if you only care about a few tags, and/or you are dealing with big files and can process them sequentially. The xpath model is a little bit of each -- you can pick and choose paths to the data elements you need, but it requires more libraries to use.

If you want straightforward and packaged with Python, minidom is your answer, but it's pretty lame, and the documentation is "here's docs on dom, go figure it out". It's really annoying.

Personally, I like cElementTree, which is a faster (c-based) implementation of ElementTree, which is a dom-like model.

I've used sax systems, and in many ways they're more "pythonic" in their feel, but I usually end up creating state-based systems to handle them, and that way lies madness (and bugs).

I say go with minidom if you like research, or ElementTree if you want good code that works well.

Kent Quirk 2008-09-16 04:35:28

In Python, there are others ways, such as ElementTree (see Gareth Simpson's reply)

bortzmeyer 2008-10-15 08:27:46

+1 A:

I assume that the .Net-way of processing XML builds on ´som version of MSXML and it that case I assume that using for example minidom would make you feel somewhat at home. However, if it is simple processing you are doing any library will probably do.

Me too prefers working with ElementTree when dealing with xml in Python, it is a very neat library.

Mingus Rude 2008-09-16 06:20:01

+2 A:

I've used ElementTree for several projects and recommend it.

It's pythonic, comes 'in the box' with Python 2.5, including the c version cElementTree (xml.etree.cElementTree) which is 20 times faster than the pure Python version, and is very easy to use.

lxml has some perfomance advantages, but they are uneven and you should check the benchmarks first for your use case.

As I understand it, ElementTree code can easily be ported to lxml.

2008-09-23 19:42:58

+1 A:

If you're going to be building SOAP messages, check out soaplib. It uses ElementTree under the hood, but it provides a much cleaner interface for serializing and deserializing messages.

William McVey 2008-10-13 22:17:10

It depends a bit on how complicated the document needs to be.

I've used minidom a lot for writing XML, but that's usually been just reading documents, making some simple transformations, and writing them back out. That worked well enough until I needed the ability to order element attributes (to satisfy an ancient application that doesn't parse XML properly). At that point I gave up and wrote the XML myself.

If you're only working on simple documents, then doing it yourself can be quicker and simpler than learning a framework. If you can conceivably write the XML by hand, then you can probably code it by hand as well (just remember to properly escape special characters, and use str.encode(codec, errors="xmlcharrefreplace")). Apart from these snafus, XML is regular enough that you don't need a special library to write it. If the document is too complicated to write by hand, then you should probably look into one of the frameworks already mentioned. At no point should you need to write a general XML writer.

giltay 2008-10-14 18:26:04

+1 A:

I would recommend lxml.

Eric Snow 2010-08-13 08:53:25

ansaurus

tags:

views:

answers:

XML Processing in Python

related questions