ansaurus

Question

Python XML - build flat record from dynamic nested "node" elements

Answer 1

+4 A:

That's why you have Element Tree find method with an XPath.

class Plan( object ):
    def __init__( self ):
        self.srv= None
        self.sub= None
        self.plan= None
        self.group= None
        self.subgroup= None
        self.defrate= None
        self.altrate= None
    def initFrom( self, other ):
        self.srv= other.srv
        self.sub= other.sub
        self.plan= other.plan
        self.group= other.group
        self.subgroup= other.subgroup
    def __str__( self ):
        return "%s %s %s %s %s %s %s" % (
            self.srv, self.sub, self.plan, self.group, self.subgroup,
            self.defrate, self.altrate )

def setRates( obj, aSearch ):
    for rate in aSearch:
        if rate.text.strip() == "default":
            obj.defrate= rate.find("rate").text.strip()
        elif rate.text.strip() == "alternative":
            obj.altrate= rate.find("rate").text.strip()
        else:
            raise Exception( "Unexpected Structure" )

def planIter( doc ):
    for topNode in doc.findall( "node" ):
        obj= Plan()
        obj.srv= topNode.text.strip()
        subNode= topNode.find("node")
        obj.sub= subNode.text.strip()
        planNode= topNode.find("node/node")
        obj.plan= planNode.text.strip()
        l3= topNode.find("node/node/node")
        if l3.text.strip() in ( "default", "alternative" ):
            setRates( obj, topNode.findall("node/node/node") )
            yield obj
        else:
            for group in topNode.findall("node/node/node"):
                grpObj= Plan()
                grpObj.initFrom( obj )
                grpObj.group= group.text.strip()
                l4= group.find( "node" )
                if l4.text.strip() in ( "default", "alternative" ):
                    setRates( grpObj, group.findall( "node" ) )
                    yield grpObj
                else:
                    for subgroup in group.findall("node"):
                        subgrpObj= Plan()
                        subgrpObj.initFrom( grpObj )
                        subgrpObj.subgroup= subgroup.text.strip()
                        setRates( subgrpObj, subgroup.findall("node") )
                        yield subgrpObj

import xml.etree.ElementTree as xml
doc = xml.XML( doc )

for plan in planIter( doc ):
    print plan

Edit

Whoever gave you this XML document needs to find another job. This is A Bad Thing (TM) and indicates a fairly casual disregard for what XML means.

S.Lott 2009-03-27 11:34:07

Thanks for the quick response. The node names are all "node" and unfortunately as I stated earlier, I can't assume that "subgroup" is the last level, otherwise this would have been very easy. The node depth is not static. There could be children of the subgroup "node". Thoughts? Thanks again!

John 2009-03-27 12:27:01

Also - each "level" can have multiple sub-levels. So creating a single object just from the top node loop won't work. You need to loop on each child "node" as well to build each record.

John 2009-03-27 12:30:56

@S.Lott - I couldn't agree more on the XML structure but unfortunately, it is "system generated" and they refuse to change it. :-(

John 2009-03-27 13:59:06

@John: They're incompetent. It's trivially changeable if they would simply subclass the DOM objects properly. Seriously. What they're doing A Bad Thing on two levels -- it's wrong and they're refusing to change.

S.Lott 2009-03-27 14:21:43

@S.Lott - Ha! Again, couldn't agree more. When I explained the poor XML structure and the need to change, they just looked at me w/ blank faces and said "that can't be changed". Oh well -- nothing like building a complex solution to deal w/ a poor design. Thanks again.

John 2009-03-27 15:43:03

Answer 2

A:

I'm not too familiar with the ElementTree module, but you should be able to use the getchildren() method on an element, and recursively parse data until there are no more children. This is more sudo-code than anything:

def parseXml(root, data):
    # INSERT CODE to populate your data object here with the values 
    # you want from this node
    sub_nodes = root.getchildren()
    for node in sub_nodes:
        parseXml(node, data)

data = {}  # I'm guessing you want a dict of some sort here to store the data you parse
parseXml(parse(file).getroot(), data)
# data will be filled and ready to use

jcoon 2009-03-27 12:48:54

ansaurus

tags:

views:

answers:

Python XML - build flat record from dynamic nested "node" elements

related questions