EDIT
The use of the phrase "bad at XML" in this question has been a point of contention, so I'd like to start out by providing a very clear definition of what I mean by this term in this context: if support for standard XML APIs is poor, and forces one to use a language-specific API, in which namespaces seem to be an afterthought, then I would be inclined to characterize that language as being not as well suited to using XML as other mainstream languages that do not have these issues. "Bad at XML" is just a shorthand for these conditions, and I think it is a fair way to characterize it. As I will describe, my initial experience with Python has raised concerns about whether it fulfils these conditions; but, because in general my experience with Python has been quite positive, it seems likely that I'm missing something, thus motivating this question.
I'm trying to do some very simple XML processing with Python. I had initially hoped to be able to reuse my knowledge of standard W3C DOM API's, and happily found that the xml.dom and xml.dom.minidom modules did a good job of supporting these API's. Unfortunately, however, serialization proved to be problematic, for the following reasons:
- xml.dom does not come with a serializer
- the PyXML library, which includes a serializer for xml.dom, is no longer maintained, AND
- minidom does not support serialization of namespaces, even though namespaces are supported in the API
I looked through the list of other W3C-like libraries here:
http://wiki.python.org/moin/PythonXml#W3CDOM-likelibraries
I found that many other libraries, such as 4Suite and libxml2dom, are also not maintained.
On the other hand, itools at first glance appears to be maintained, but there does not appear to be an Ubuntu/Debian package available, and so would be difficult to deploy and maintain.
At this point, it seemed like trying to use W3C DOM API's in my Python application was going to be dead-end, and I began to look at the ElementTree API. But the way the eTree API supports namespaces I think is horribly ugly, requiring one to use string concatenation every time an element in a particular namespace is created:
http://codespeak.net/lxml/tutorial.html#namespaces
So, my question is, have I overlooked something, or is support for XML (in particular W3C DOM) actually quite bad in Python?
EDIT
Here follows a list of more precise questions, the answers to which would really help me:
- Is there reasonable support for W3C DOM in Python?
- If not
xml.dom
, do you use e.g.etree
instead of W3C DOM? - If so, which library is best, and how do you overcome the issues regarding namespacing in the API?
- If you use W3C DOM instead, are you aware of a library that implements serialization with support for namespaces?