ansaurus

Question

Really simple way to deal with XML in Python?

Answer 1

+2 A:

If you don't mind using a 3rd party library, then BeautifulSoup will do almost exactly what you ask for:

>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup('''<snip>''')
>>> soup.xml_api_reply.weather.current_conditions.temp_f['data']
u'68'

Mike Boers 2010-06-24 00:29:24

I already looked at Beautiful[Stone]Soup - but it is **broken** (as documented http://www.crummy.com/software/BeautifulSoup/documentation.html ) for empty tags like <element/> - which are aplenty in the example. For example `soup.xml_api_reply.weather.current_conditions.icon` returns `<icon data="/ig/images/weather/partly_cloudy.gif"><wind_condition data="Wind: N at 25 mph"></wind_condition></icon>` or you can get `temp_c` via `soup.xml_api_reply.weather.current_conditions.condition.temp_f.temp_c['data']` which seems demented to me

Nas Banov 2010-06-24 00:43:37

It will work if 1) you know what empty tags you're looking for, and 2) they can be relied upon to be empty. Then you can specify them as an argument to the parser: selfClosingTags=['city','postal_code', ...]

Owen S. 2010-06-24 01:10:18

@Owen S: <nod>, selfClosingTags indeed helped with the StoneSoup but one shouldn't have to do that really. The example is chock full of empty tags (that should have been attributes ... but aren't) - and so is the case in many XMLs

Nas Banov 2010-06-24 03:40:35

Answer 2

A:

If you haven't already, I'd suggest looking into the DOM API for Python. DOM is a pretty widely used XML interpretation system, so it should be pretty robust.

It's probably a little more complicated than what you describe, but that comes from its attempts to preserve all the information implicit in XML markup rather than from bad design.

tlayton 2010-06-24 00:30:35

The question is specifically for easy, pythonic XML access. DOM is many things (not _all_ of which are evil, I admit), but "easy" and "pythonic" it most certainly is _not_. Reverting to DOM for interacting with XML is like dropping to C (or worse, assembly) for a webapp -- it should be done rarely and only for remarkably good reason.

Nicholas Knight 2010-06-24 02:18:20

And I'd also like to note that's not because of preserving XML structure; it's because that library tries hard to adhere to a cross-language API for its interface. There are more Pythonic libraries that preserve the XML structure quite precisely.

Owen S. 2010-06-24 21:03:08

Answer 3

+3 A:

Take a look at Amara 2, particularly the Bindery part of this tutorial.

It works in a way pretty similar to what you describe.

On the other hand. ElementTree's find*() methods can give you 90% of that and are packaged with Python.

Walter Mundt 2010-06-24 00:35:37

I looked and indeed `amara.bindery` does what I am looking for - but it seems way too big (600k installer, 3MB source) - it's like someone said, I wanted a banana but now I got a "free" gorilla with it. Re ElementTree find*() - it's close but lacks that pythonic []/iterator veneer i was thinking about

Nas Banov 2010-06-24 03:17:44

Answer 4

A:

I believe that the built in python xml module will do the trick. Look at "xml.parsers.expat"

xml.parsers.expat

iform 2010-06-24 01:02:44

A low-level SAXlike parser interface provides a Pythonic object interface to the parsed XML document?? What am I missing?

Owen S. 2010-06-24 02:03:04

Answer 5

+3 A:

I highly recommend lxml.etree and xpath to parse and analyse your data. Here is a complete example. I have truncated the xml to make it easier to read.

import lxml.etree

s = """<?xml version="1.0" encoding="utf-8"?>
<xml_api_reply version="1">
  <weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" >
    <forecast_information>
      <city data="Mountain View, CA"/> <forecast_date data="2010-06-23"/>
    </forecast_information>
    <forecast_conditions>
      <day_of_week data="Sat"/>
      <low data="59"/>
      <high data="75"/>
      <icon data="/ig/images/weather/partly_cloudy.gif"/>
      <condition data="Partly Cloudy"/>
    </forecast_conditions>
  </weather>
</xml_api_reply>"""

tree = lxml.etree.fromstring(s)
for weather in tree.xpath('/xml_api_reply/weather'):
    print weather.find('forecast_information/city/@data')[0]
    print weather.find('forecast_information/forecast_date/@data')[0]
    print weather.find('forecast_conditions/low/@data')[0]
    print weather.find('forecast_conditions/high/@data')[0]

Jerub 2010-06-24 01:20:49

Seems fairly easy indeed and i am taking note - but it is more xpath-ish (xpathologic?) than pythonic.

Nas Banov 2010-06-29 16:57:59

Answer 6

+4 A:

You want a thin veneer? That's easy to cook up. Try the following trivial wrapper around ElementTree as a start:

# geetree.py
import xml.etree.ElementTree as ET

class GeeElem(object):
    """Wrapper around an ElementTree element. a['foo'] gets the
       attribute foo, a.foo gets the first subelement foo."""
    def __init__(self, elem):
        self.etElem = elem

    def __getitem__(self, name):
        res = self._getattr(name)
        if res is None:
            raise AttributeError, "No attribute named '%s'" % name
        return res

    def __getattr__(self, name):
        res = self._getelem(name)
        if res is None:
            raise IndexError, "No element named '%s'" % name
        return res

    def _getelem(self, name):
        res = self.etElem.find(name)
        if res is None:
            return None
        return GeeElem(res)

    def _getattr(self, name):
        return self.etElem.get(name)

class GeeTree(object):
    "Wrapper around an ElementTree."
    def __init__(self, fname):
        self.doc = ET.parse(fname)

    def __getattr__(self, name):
        if self.doc.getroot().tag != name:
            raise IndexError, "No element named '%s'" % name
        return GeeElem(self.doc.getroot())

    def getroot(self):
        return self.doc.getroot()

You invoke it so:

>>> import geetree
>>> t = geetree.GeeTree('foo.xml')
>>> t.xml_api_reply.weather.forecast_information.city['data']
'Mountain View, CA'
>>> t.xml_api_reply.weather.current_conditions.temp_f['data']
'68'

Owen S. 2010-06-24 01:53:17

Answer 7

+4 A:

lxml has been mentioned. You might also check out lxml.objectify for some really simple manipulation.

>>> from lxml import objectify
>>> tree = objectify.fromstring(your_xml)
>>> tree.weather.attrib["module_id"]
'0'
>>> tree.weather.forecast_information.city.attrib["data"]
'Mountain View, CA'
>>> tree.weather.forecast_information.postal_code.attrib["data"]
'94043'

Ryan Ginstrom 2010-06-24 02:27:54

++. Does what asked for, although like in the Amara case, free gorilla (a ton of non-distro library) comes with the order of banana. Btw, seems also can use `.get('data')` instead of `.attrib['data']`

Nas Banov 2010-06-29 19:48:36

Answer 8

A:

The suds project provides a Web Services client library that works almost exactly as you describe -- provide it a wsdl and then use factory methods to create the defined types (and process the responses too!).

d.w. 2010-06-24 03:01:17

Ok, that's interesting... but doesn't the name **suds** imply this is only for use with **SOAP**? The example above is not SOAPy and i won't want to go the slippery-SOAP. Also, where do I find WSDL of - for example - the weather service above?

Nas Banov 2010-06-24 03:28:21

Yes, you're right -- suds is definitely geared towards SOAP web-services rather than generic XML. A WSDL would be the published contract for that kind of service. My bad, I had assumed you had rinsed the bubbles from that example prior to posting :-)

d.w. 2010-06-24 12:46:41

Answer 9

A:

I found the following python-simplexml module, which in the attempts of the author to get something close to SimpleXML from PHP is indeed a small wrapper around ElementTree. It's under 100 lines but seems to do what was requested:

>>> import SimpleXml
>>> x = SimpleXml.parse(urllib.urlopen('http://www.google.com/ig/api?weather=94043'))
>>> print x.weather.current_conditions.temp_f['data']
58

Nas Banov 2010-06-24 05:08:22

ansaurus

tags:

views:

answers:

Really simple way to deal with XML in Python?

related questions