ansaurus

Question

Parsing xml file in python

Answer 1

+2 A:

I would recommend using the minidom library.

The docs are pretty good so you should be up and running in no time.

Dan.

freeasinbeer 2009-12-15 16:00:45

Answer 2

A:

Why not try something like the PyXml library. They have lots of documentation and tutorials.

Gordon 2009-12-15 16:02:45

**WARNING** Norwegian Blue Parrot syndrome: Last release 5 years ago. No Windows installers for Python 2.5 and 2.6.

John Machin 2009-12-16 21:27:20

Answer 3

A:

Another XML parsing library: http://www.crummy.com/software/BeautifulSoup/

Parsing XML documentation starts here: http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20XML

The MYYN 2009-12-15 16:04:45

Answer 4

A:

As others have stated minidom is the way to go here. You open (and parse) the file, while going through the nodes you check if its relevant and should be read. That way, you also know if you want to read the child nodes.

Threw together this, seems to do what you want. Some of the values are read by attribute position rather than attribute name. And theres no error handling. And the print () at the end means its Python 3.x.

I'll leave it as an exercise to improve upon that, just wanted to post a snippet to get you started.

Happy hacking! :)

xml.txt

<doc>
<id name="X">
  <type name="A">
    <min val="100" id="80"/>
    <max val="200" id="90"/>
   </type>
  <type name="B">
    <min val="100" id="20"/>
    <max val="20" id="90"/>
  </type>
</id>
</doc>

parsexml.py

from xml.dom import minidom
data={}
doc=minidom.parse("xml.txt")
for n in doc.childNodes[0].childNodes:
    if n.localName=="id":
     id_name = n.attributes.item(0).nodeValue
     data[id_name] = {}
     for j in n.childNodes:
      if j.localName=="type":
       type_name = j.attributes.item(0).nodeValue
       data[id_name][type_name] = [(),()]
       for k in j.childNodes:
        if k.localName=="min":
         data[id_name][type_name][0] = \
          (k.attributes.item(1).nodeValue, \
           k.attributes.item(0).nodeValue)
        if k.localName=="max":
         data[id_name][type_name][1] = \
          (k.attributes.item(1).nodeValue, \
           k.attributes.item(0).nodeValue)
print (data)

Output:

{'X': {'A': [('100', '80'), ('200', '90')], 'B': [('100', '20'), ('20', '90')]}}

mizipzor 2009-12-15 16:38:07

Sorry, wrong room. The fugly code competition is down the hall.

John Machin 2009-12-15 21:31:51

Answer 5

+7 A:

I disagree with the suggestion in other answers to use minidom -- that's a so-so Python adaptation of a standard originally conceived for other languages, usable but not a great fit. The recommended approach in modern Python is ElementTree.

The same interface is also implemented, faster, in third party module lxml, but unless you need blazing speed the version included with the Python standard library is fine (and faster than minidom anyway) -- the key point is to program to that interface, then you can always switch to a different implementation of the same interface in the future if you want to, with minimal changes to your own code.

For example, after the needed imports &c, the following code is a minimal implementation of your example (it does not verify that the XML is correct, just extracts the data assuming correctness -- adding various kinds of checks is pretty easy of course):

from xml.etree import ElementTree as et  # or, import any other, faster version of ET

def xml2data(xmlfile):
  tree = et.parse(xmlfile)
  data = {}
  for anid in tree.getroot().getchildren():
    currdict = data[anid.get('name')] = {}
    for atype in anid.getchildren():
      currlist = currdict[atype.get('name')] = []
      for c in atype.getchildren():
        currlist.append((c.get('val'), c.get('id')))
  return data

This produces your desired result given your sample input.

Alex Martelli 2009-12-15 17:18:26

`for child in node.getchildren():` is unnecessary; use `for child in node:` instead.

John Machin 2009-12-16 21:27:59

Answer 6

A:

Do not reinvent the wheel. Use Amara toolkit. Variable names are just keys in a dictionary anyway. http://www.xml3k.org/Amara

Hamish Grubijan 2009-12-15 21:25:13

Another link - http://www.xml.com/pub/a/2005/01/19/amara.htmlYou will end up with a variable doc, which has doc.id, which has doc.id.type[0], then doc.id.type[0].min, ... and so on. Super-easy to access!

Hamish Grubijan 2009-12-15 21:33:02

ansaurus

tags:

views:

answers:

Parsing xml file in python

related questions