tags:

views:

90

answers:

4

I'm working with XML and Python for the first time. The ultimate goal is to send a request to a REST service, receive a response in XML, and parse the values and send emails depending on what was returned. However, the REST service is not yet in place, so for now I'm experimenting with an XML file saved on my C drive.

I have a simple bit of code, and I'm confused about why it isn't working.

This is my xml file ("XMLTest.xml"):

<Response>
    <exitCode>1</exitCode>
    <fileName>C:/Something/</fileName>
    <errors>
        <error>Error generating report</error>
    </errors>
</Response>

This is my code so far:

from xml.dom import minidom

something = open("C:/XMLTest.xml")
something = minidom.parse(something)

nodeList = []
for node in something.getElementsByTagName("Response"):  
    nodeList.extend(t.nodeValue for t in node.childNodes)
print nodeList

But the results that print out are...

[u'\n\t', None, u'\n\t', None, u'\n\t', None, u'\n']

What am I doing wrong?

I'm trying to get the node values. Is there a better way to do this? Is there a built-in method in Python to convert an xml file to an object or dictionary? I'd like to get all the values, preferably with the names attached.

+2  A: 

Does this help?

doc = '''<Response>
    <exitCode>1</exitCode>
    <fileName>C:/Something/</fileName>
    <errors>
        <error>Error generating report</error>
    </errors>
</Response>'''

from xml.dom import minidom

something = minidom.parseString( doc )

nodeList = [ ]
for node in something.getElementsByTagName( "Response" ):
    response = { }
    response[ "exit code" ] = node.getElementsByTagName( "exitCode" )[ 0 ].childNodes[ 0 ].nodeValue
    response[ "file name" ] = node.getElementsByTagName( "fileName" )[ 0 ].childNodes[ 0 ].nodeValue
    errors = node.getElementsByTagName( "errors" )[ 0 ].getElementsByTagName( "error" )
    response[ "errors" ] = [ error.childNodes[ 0 ].nodeValue for error in errors ]

    nodeList.append( response )

import pprint
pprint.pprint( nodeList )

yields

[{'errors': [u'Error generating report'],
  'exit code': u'1',
  'file name': u'C:/Something/'}]
katrielalex
This works for this specific example, but I was looking for something a bit more generic that could parse any xml file into some sort of Python structure...
froadie
@froadie: That function already exists; it's called `xml.dom.minidom.parse`. There is no builtin Python structure which can adequately represent XML, so it defines its own structures.
katrielalex
This is a great solution to the problem. If you want something more robust, you might want to go with `xml.etree.ElementTree` over `xml.dom.minidom`, because it tends to involve less syntax with nested traversals.
jathanism
I ended up using this... thanks!
froadie
A: 

You're not thinking about the XML from the DOM standpoint. Namely, 'C:/Something' isn't the nodevalue of the element whose tagname is 'fileName'; it's the nodevalue of the text node that is the first child of the element whose tagname is 'fileName'.

What I recommend you do is play around with it a little more in python itself: start python.

from xml.dom import minidom

x = minidom.parseString('<Response><filename>C:/</filename>>')

x.getElementsByTagName('Response') ... x.getElementsByTagName('Response')[0].childNodes[0] ...

and so forth. You'll get a quick appreciation for how the document is being parsed.

Brian
+3  A: 

If you are just starting out with xml and python, and have no compelling reason to use DOM, I strongly suggest you have a look at the ElementTree api (implemented in the standard library in xml.etree.ElementTree)

To give you a taste:

import xml.etree.cElementTree as etree

tree = etree.parse('C:/XMLTest.xml')
response = tree.getroot()
exitcode = response.find('exitCode').text
filename = response.find('fileName').text
errors = [i.text for i in response.find('errors')]

(if you need more power - xpath, validation, xslt etc... - you can even switch to lxml, which implements the same API, but with a lot of extras)

Steven
A: 

I recommend my library xml2obj. It is way cleaner than DOM. The "library" has only 84 lines of code you can embed anywhere.

In [185]: resp = xml2obj(something)

In [186]: resp.exitCode
Out[186]: u'1'

In [187]: resp.fileName
Out[187]: u'C:/Something/'

In [188]: len(resp.errors)
Out[188]: 1

In [189]: for node in resp.errors:
   .....:     print node.error
   .....:
   .....:
Error generating report
Wai Yip Tung