views:

84

answers:

3

I need to use Python 2.4.4 to convert XML to and from a Python dictionary. All I need are the node names and values, I'm not worried about attributes because the XML I'm parsing doesn't have any. I can't use ElementTree because that isn't available for 2.4.4, and I can't use 3rd party libraries due to my work environment. What's the easiest way for me to do this? Are there any good snippets?

Also, if there isn't an easy way to do this, are there any alternative serialization formats that Python 2.4.4 has native support for?

+1  A: 

Question http://stackoverflow.com/questions/1019895/serialize-python-dictionary-to-xml lists some ways of XML serialization. As for alternative serialization formats, I guess pickle module is a nice tool for it.

Grey Teardrop
On that question 3/4 of those are external libraries, which isn't an option for me. The last one marshalizes an XML document, which isn't really what I'm looking for, but it does seem to convert it into a dictionary first so I'll take a look.
Liam
+1  A: 

Grey's link includes some solutions that look pretty robust. If you want to roll your own though, you could use xml.dom.node's childNode member recursively, terminating when node.childNode = None.

altie
+1  A: 

I recently wrote some code to translate XML into a python data structure, although I did have to handle attributes. I used xml.dom.minidom rather than ElementTree, for a similar reason. I haven't actually tested this on Python 2.4.4, but I think it will work. I didn't write a reverse XML generator, though you can probably use the 'lispy_string' function I included to do this.

I also included some shortcuts specific to the application I was writing (explained in the docstring), but you might find those shortcuts useful too, from the sounds of it. Essentially, an xml tree technically translates into a dictionary of lists of dictionaries of lists of dictionaries of lists, etc. I omit creating the intermediary lists unless they are necessary, so you can reference elements by dictname[element1][element2] rather than dictname[element1][0][element2][0] and so on.

Attribute handling is a little kludgy, I strongly recommend reading the code before doing anything with attributes.

import sys
from xml.dom import minidom

def dappend(dictionary, key, item):
    """Append item to dictionary at key.  Only create a list if there is more than one item for the given key.
    dictionary[key]=item if key doesn't exist.
    dictionary[key].append(item) if key exists."""
    if key in dictionary.keys():
        if not isinstance(dictionary[key], list):
            lst=[]
            lst.append(dictionary[key])
            lst.append(item)
            dictionary[key]=lst
        else:
            dictionary[key].append(item)
    else:
        dictionary.setdefault(key, item)

def node_attributes(node):
    """Return an attribute dictionary """
    if node.hasAttributes():
        return dict([(str(attr), str(node.attributes[attr].value)) for attr in node.attributes.keys()])
    else:
        return None

def attr_str(node):
    return "%s-attrs" % str(node.nodeName)

def hasAttributes(node):
    if node.nodeType == node.ELEMENT_NODE:
        if node.hasAttributes():
            return True
    return False

def with_attributes(node, values):
    if hasAttributes(node):
        if isinstance(values, dict):
            dappend(values, '#attributes', node_attributes(node))
            return { str(node.nodeName): values }
        elif isinstance(values, str):
            return { str(node.nodeName): values,
                     attr_str(node): node_attributes(node)}
    else:
        return { str(node.nodeName): values }

def xmldom2dict(node):
    """Given an xml dom node tree,
    return a python dictionary corresponding to the tree structure of the XML.
    This parser does not make lists unless they are needed.  For example:

    '<list><item>1</item><item>2</item></list>' becomes:
    { 'list' : { 'item' : ['1', '2'] } }
    BUT
    '<list><item>1</item></list>' would be:
    { 'list' : { 'item' : '1' } }

    This is a shortcut for a particular problem and probably not a good long-term design.
    """
    if not node.hasChildNodes():
        if node.nodeType == node.TEXT_NODE:
            if node.data.strip() != '':
                return str(node.data.strip())
            else:
                return None
        else:
            return with_attributes(node, None)
    else:
        #recursively create the list of child nodes
        childlist=[xmldom2dict(child) for child in node.childNodes if (xmldom2dict(child) != None and child.nodeType != child.COMMENT_NODE)]
        if len(childlist)==1:
            return with_attributes(node, childlist[0])
        else:
            #if False not in [isinstance(child, dict) for child in childlist]:
            new_dict={}
            for child in childlist:
                if isinstance(child, dict):
                    for k in child:
                        dappend(new_dict, k, child[k])
                elif isinstance(child, str):
                    dappend(new_dict, '#text', child)
                else:
                    print "ERROR"
            return with_attributes(node, new_dict)

def load(fname):
    return xmldom2dict(minidom.parse(fname))

def lispy_string(node, lst=None, level=0):
    if lst==None:
        lst=[]
    if not isinstance(node, dict) and not isinstance(node, list):
        lst.append(' "%s"' % node)
    elif isinstance(node, dict):
        for key in node.keys():
            lst.append("\n%s(%s" % (spaces(level), key))
            lispy_print(node[key], lst, level+2)
            lst.append(")")
    elif isinstance(node, list):
        lst.append(" [")
        for item in node:
            lispy_print(item, lst, level)
        lst.append("]")
    return lst

if __name__=='__main__':
    data = minidom.parse(sys.argv[1])

    d=xmldom2dict(data)

    print d
Goladus