ansaurus

Question

Trying to parse an XML file with Python - what am I doing wrong?

Answer 1

+2 A:

Does this help?

doc = '''<Response>
    <exitCode>1</exitCode>
    <fileName>C:/Something/</fileName>
    <errors>
        <error>Error generating report</error>
    </errors>
</Response>'''

from xml.dom import minidom

something = minidom.parseString( doc )

nodeList = [ ]
for node in something.getElementsByTagName( "Response" ):
    response = { }
    response[ "exit code" ] = node.getElementsByTagName( "exitCode" )[ 0 ].childNodes[ 0 ].nodeValue
    response[ "file name" ] = node.getElementsByTagName( "fileName" )[ 0 ].childNodes[ 0 ].nodeValue
    errors = node.getElementsByTagName( "errors" )[ 0 ].getElementsByTagName( "error" )
    response[ "errors" ] = [ error.childNodes[ 0 ].nodeValue for error in errors ]

    nodeList.append( response )

import pprint
pprint.pprint( nodeList )

yields

[{'errors': [u'Error generating report'],
  'exit code': u'1',
  'file name': u'C:/Something/'}]

katrielalex 2010-08-26 14:51:39

This works for this specific example, but I was looking for something a bit more generic that could parse any xml file into some sort of Python structure...

froadie 2010-08-26 14:55:17

@froadie: That function already exists; it's called `xml.dom.minidom.parse`. There is no builtin Python structure which can adequately represent XML, so it defines its own structures.

katrielalex 2010-08-26 14:59:01

This is a great solution to the problem. If you want something more robust, you might want to go with `xml.etree.ElementTree` over `xml.dom.minidom`, because it tends to involve less syntax with nested traversals.

jathanism 2010-08-26 15:16:16

I ended up using this... thanks!

froadie 2010-09-14 14:38:52

Answer 2

A:

You're not thinking about the XML from the DOM standpoint. Namely, 'C:/Something' isn't the nodevalue of the element whose tagname is 'fileName'; it's the nodevalue of the text node that is the first child of the element whose tagname is 'fileName'.

What I recommend you do is play around with it a little more in python itself: start python.

from xml.dom import minidom

x = minidom.parseString('<Response><filename>C:/</filename>>')

x.getElementsByTagName('Response') ... x.getElementsByTagName('Response')[0].childNodes[0] ...

and so forth. You'll get a quick appreciation for how the document is being parsed.

Brian 2010-08-26 14:54:28

Answer 3

+3 A:

If you are just starting out with xml and python, and have no compelling reason to use DOM, I strongly suggest you have a look at the ElementTree api (implemented in the standard library in xml.etree.ElementTree)

To give you a taste:

import xml.etree.cElementTree as etree

tree = etree.parse('C:/XMLTest.xml')
response = tree.getroot()
exitcode = response.find('exitCode').text
filename = response.find('fileName').text
errors = [i.text for i in response.find('errors')]

(if you need more power - xpath, validation, xslt etc... - you can even switch to lxml, which implements the same API, but with a lot of extras)

Steven 2010-08-26 15:30:38

Answer 4

A:

I recommend my library xml2obj. It is way cleaner than DOM. The "library" has only 84 lines of code you can embed anywhere.

In [185]: resp = xml2obj(something)

In [186]: resp.exitCode
Out[186]: u'1'

In [187]: resp.fileName
Out[187]: u'C:/Something/'

In [188]: len(resp.errors)
Out[188]: 1

In [189]: for node in resp.errors:
   .....:     print node.error
   .....:
   .....:
Error generating report

Wai Yip Tung 2010-08-26 15:31:30

ansaurus

tags:

views:

answers:

Trying to parse an XML file with Python - what am I doing wrong?

related questions