views:

483

answers:

4

Working in python I want to extract a dataset with the following structure:

Each item has a unique ID and the unique ID of its parent. Each parent can have one or more children, each of which can have one or more children of its own, to n levels i.e. the data has an upturned tree-like structure. While it has the potential to go on for infinity, in reality a depth of 10 levels is unusual, as is having more than 10 siblings at each level.

For each item in the dataset I want to show show all items for which this item is their parent... and so on until it reaches the bottom of the dataset.

Doing the first two levels is easy, but I'm unsure how to make it efficiently recurs down through the levels.

Any pointers very much appreciated.

A: 

Are you saying that each item only maintains a reference to its parents? If so, then how about

def getChildren(item) :
    children = []
    for possibleChild in allItems :
        if (possibleChild.parent == item) :
            children.extend(getChildren(possibleChild))
    return children

This returns a list that contains all items who are in some way descended from item.

PFHayes
Am I correct in thinking that this would return a big list of items that were descendants of a given item, from all levels, but that you would lose the structure of the dataset?
notreadbyhumans
So that's the same as this one-line list comprehension: "def getChildren(item) : return [ getChildren(child) for child in allItems if child.parent == item]"I've never seen a list comprehension with recursion before.
hughdbrown
+1  A: 

If you want to keep the structure of your dataset, this will produce a list of the format [id, [children of id], id2, [children of id2]]

def children(id):                                                                         
    return [id]+[children(x.id) for x in filter(lambda x:x.parent == id, items)]
ACoolie
+1  A: 

you should probably use a defaultdictionary for this:

from collections import defaultdict    

itemdict = defaultdict(list)
for id, parent_id in itemlist:
   itemdict[parent_id].append(id)

then you can recursively print it (with indentation) like

def printitem(id, depth=0):
    print '  '*depth, id
    for child in itemdict[id]:
        printitem(child, depth+1)
Jimmy
A: 

How about something like this,


#!/usr/bin/python                                                                                                            

tree = { 0:(None, [1,2,3]),
         1:(0, [4]),
         2:(0, []),
         3:(0, [5,6]),
         4:(1, [7]),
         5:(3, []),
         6:(3, []),
         7:(4, []),
         }

def find_children( tree, id ):
    print "node:", id, tree[id]
    for child in tree[id][1]:
        find_children( tree, child )

if __name__=="__main__":
    import sys
    find_children( tree, int(sys.argv[1]) )

$ ./tree.py 3
node: 3 (0, [5, 6])
node: 5 (3, [])
node: 6 (3, [])

It's also worth noting that python has a pretty low default recursion limit, 1000 I think.

In the event that your tree actually gets pretty deep you'll hit this very quickly. You can crank this up with,


sys.setrecursionlimit(100000)

and check it with,


sys.getrecursionlimit()
blackkettle