views:

106

answers:

4

Let's say I have an XML file as follows.

<A>
 <B>
  <C>"blah"</C>
  <C>"blah"</C>
 </B>
 <B>
  <C>"blah"</C>
  <C>"blah"</C>
 </B>
</A>

I need to read this file into a dictionary something like this.

dict["A.B1.C1"] = "blah"
dict["A.B1.C2"] = "blah"
dict["A.B2.C1"] = "blah"
dict["A.B2.C2"] = "blah"

But the format of the dict doesn't matter, I just want to read the all the info into the variables of Python.

The thing is that I don't know the structure of the XML, I just want to read all the info in a dictionary.

Is there any way to do this with Python?

+3  A: 

I usually parse XML using the ElemntTree module on the standard library. Itr does not give you a dictionary, you get a much more useful DOM structure which allows you to iterate over each element for children.

from xml.etree import ElementTree as ET

xml = ET.parse("<path-to-xml-file")
root_element = xml.get_root()

for child in root_element:
   ...

If there is specific need to parse it to a dicionary, insetead of getting tehinformation you need from a DOM tree, a recursive function to build one from the root node would be soemething like:

def xml_dict(node, path="", dic =None):
    if dic == None:
        dic = {}
    name_prefix = path + ("." if path else "") + node.tag
    numbers = set()
    for similar_name in dic.keys():
        if similar_name.startswith(name_prefix):
            numbers.add(int (similar_name[len(name_prefix):].split(".")[0] ) )
    if not numbers:
        numbers.add(0)
    index = max(numbers) + 1
    name = name_prefix + str(index)
    dic[name] = node.text + "<...>".join(childnode.tail
                                         if childnode.tail is not None else
                                         "" for childnode in node)
    for childnode in node:
        xml_dict(childnode, name, dic)
    return dic

For the XML you list above this yileds this dictionary:

{'A1': '\n \n <...>\n',
 'A1.B1': '\n  \n  <...>\n ',
 'A1.B1.C1': '"blah"',
 'A1.B1.C2': '"blah"',
 'A1.B2': '\n  \n  <...>\n ',
 'A1.B2.C1': '"blah"',
 'A1.B2.C2': '"blah"'}

(I find the DOM form more usefull)

jsbueno
A: 

I usually use the lxml.objectify library for quick XML parsing.

With your XML string, you can do:

from lxml import objectify
root = objectify.fromstring(xml_string)

And then get individual elements using a dictionary interface:

value = root["A"][0]["B"][0]["C"][0]

Or, if you prefer:

value = root.A[0].B[0].C[0]
Lior
A: 

Check out the answers to http://stackoverflow.com/questions/3106480/really-simple-way-to-deal-with-xml-in-python, you will probably find one of them to directly suit your needs.

Nas Banov