ansaurus

Question

Python: tree structure and numerical codes?

Answer 1

+4 A:

An ad-hoc POD ("plain old data") class representing a tree would do fine, something like:

class Location(object):
  def __init__(self, data, parent)
    self.data = data
    self.parent = parent
    self.children = []

Now assign/read the data attribute, or add/remove children, perhaps with helper methods:

def add_child(self, child):
  self.children.append(child)

Now, to actually divide your data into tree levels, a simple algorithm would be look at all the places with a common level-data (such as Africa) and assign them a location, then recursively for next level of data.

So, for Africa you create a location with data = Africa. Then, it will have a Location child for North Africa, West Africa and so on.

For "get the code" have a dictionary mapping each country to its location node, and use the parent links in the nodes. Traverse from the node to the top (until parent is None) at each level assigning the part of the code to be the index in the children list of the parent.

Eli Bendersky 2010-09-20 17:10:29

Answer 2

+7 A:

I would recommend, assuming you can count on there being no duplication among the names, something like:

class Node(object):
    byname = {}

    def __init__(self, name, parent=None):
        self.name = name
        self.parent = parent
        self.children = []
        self.byname[name] = self
        if parent is None:  # root pseudo-node
            self.code = 0
        else:  # all normal nodes
            self.parent.children.append(self)
            self.code = len(self.parent.children)

    def get_codes(self, codelist):
        if self.code:
            codelist.append(str(self.code))
            self.parent.get_codes(codelist)

root = Node('')

def get_code(nodename):
    node = Node.byname.get(nodename)
    if node is None: return ''
    codes = []
    node.get_codes(codes)
    codes.reverse()
    return '.'.join(codes)

Do you also want to see the Python code for how to add a node given a hierarchical sequence of names, such as ['Africa', 'North Africa', 'Morocco']? I hope it would be pretty clear given the above structure, so you might want to do it yourself as an exercise, but of course do ask if you'd rather see a solution instead;-).

Getting the hierarchical sequence of names from a text line (string) depends on what the separators are -- in your example it looks like it's just a bunch of spaces added for purely aesthetic reasons connected with lining up the columns (if that's the case I'd recommend a simple re based approach to split on sequence of two+ spaces), but if it's actually (e.g.) tab characters as the separators, the csv module from Python's standard library would serve you better. I just can't tell from the short example you posted in your Q!-)

Edit: the OP says they can get the sequence of names just fine but would like to see the code to add the relevant nodes from those -- so, here goes!-)

def addnodes(names):
    parent = root
    for name in names:
        newnode = Node.byname.get(name)
        if newnode is None:
            newnode = Node(name, parent)
        parent = newnode

See why it's important that node names are unique, to make the above class work? Since Node.byname is a single per-class dict, it can record only one "corresponding node" for each given name -- thus, a name that's duplicated in two or more places in the hierarchy would "clash" and only one of the two or more nodes would be properly recorded.

But then again, the function get_code which the OP says is the main reason for this whole apparatus couldn't work as desired if a name could be ambiguous, since the OP's specs mandate it returning only one string. So, some geographical list like

America   United States    Georgia
Europe    Eastern Europe   Georgia

(where two completely unrelated areas just happen to be both named 'Georgia' -- just the kind of thing that unfortunately often happens in real-world geography, as the above example shows!-) would destroy the whole scheme (depending on how the specs for get_code happen to be altered to deal with an ambiguous-name argument, of course, the class structure could surely be altered accordingly and accomodate the new, drastically different specs!).

The nice thing about encapsulating these design decisions in a class (albeit in this case with a couple of accompanying functions -- they could be elegantly be made into class methods, of course, but the OP's specs rigidly demand that get_code be a function, so I decided that, in that case addnodes might as well also be one!-) is that the specific design decisions are mostly hidden from the rest of the code and thus can easily be altered (as long as specs never change, of course -- that's why it's so crucial to spend time and attention defining one's API specs, much more than on any other part of design and coding!-) to refactor the internal behavior (e.g. for optimization, ease of debugging/testing, and so on) while maintaining API-specified semantics intact, and thus leaving all other parts of the application pristine (not even needing re-testing, actually, as long of course as the parts that implement the API are very thoroughly unit-tested -- not hard to do, since they're nicely isolated and stand-alone!-).

Alex Martelli 2010-09-20 17:17:33

Is there a need for the use of `re`? `csv` can handle multiple spaces with `skipinitialspace`, but even simpler: `''.split` would split fine.

Muhammad Alkarouri 2010-09-20 19:09:14

This looks awesome, thank you! :) Yes, please could you give an example of how to add a node given the list? I can get to the list from the string just fine, but I'm scratching my head over how to turn it into a node.

AP257 2010-09-20 21:17:52

@AP257: Hint: the constructor `__init__` above takes `parent` as a parameter. So you probably want to construct the parent node and then its child and so on.

Muhammad Alkarouri 2010-09-20 22:02:27

@Muhammad, wrt your 1st comment: splitting by a space, an arbitrary sequence of spaces, or even in CSV with the skip-initial-space, would e.g. make `West` and `Africa` into **separate** strings -- don't you see what a total disaster that would be wrt the OP's stated intention to have `'West Africa'` as one node name?!

Alex Martelli 2010-09-20 22:08:59

@Alex: yes, I missed the 2+ spaces. I make it a rule to avoid `re` if a simple string method works, but this time I was misguided. Sorry! _hides under a rock_

Muhammad Alkarouri 2010-09-20 23:19:12

Alex - you rock! Thanks for the detailed explanation.

AP257 2010-09-21 08:57:39

@AP257, you're welcome!

Alex Martelli 2010-09-21 15:06:14

Answer 3

A:

I am not sure, if I have got it right. If we keep every object in a global dict then it defeats the purpose of using a tree, which is only used to construct the numbering scheme. But the tree based representation looks something like this:

class Location(object):

    allLocation = {}

    def __init__(self, name):
        self.name = name
        self.parent = None
        self.number = "0"
        self.children = {}

    def putChild(self, childLocation):
        if childLocation.name not in self.allLocation.keys():
            # Now adjust the number scheme
            #if self.number is "0":
            # this is root
            numScheme = str(len(self.children) + 1)

            childLocation.setNumber(numScheme)

            # Add the child
            self.children[childLocation.number] = childLocation
            self.allLocation[childLocation.name] = childLocation
            childLocation.parent = self
            return 0
        else:
            return 1 # Location already a child of the current clocation

    def setNumber(self, num):
        if self.number is not "0":
            # raise an exception, number already adjusted
            pass
        else:
            # set the number 
            self.number = num


    def locateChild(self, numScheme):
        # Logic should be to break up the numScheme and pass the token successively 
        numSchemeList = []

        if type(numScheme) is str:
            numSchemeList = numScheme.split(".")
        else:
            numSchemeList = numScheme


        if len(numSchemeList) >= 1:
            k = numSchemeList.pop()
            # if the child is available 

            if k in self.children.keys():
                childReferenced = self.children[k]


                # Is child of child required
                if len(numSchemeList) >= 1:
                    return childReferenced.locateChild(numSchemeList)
                else:
                    return childReferenced
            else:
                # No such child
                return None
        else:
            # The list is empty , search ends here
            return None

    def getScheme(self, name):
        if name in self.allLocation.keys():
            locObj = self.allLocation[name]
            return locObj.getNumScheme(name, "")
        else:
            return None

    def getNumScheme(self, name, numScheme="0",):
        if not self.parent:
            return numScheme
        if numScheme != "":
            return self.parent.getNumScheme(name, self.number + "." + numScheme)
        else:
            return self.parent.getNumScheme(name, self.number )



root = Location("root")
africa = Location("Africa")
asia = Location("Asia")
america = Location("America")
root.putChild(africa)
root.putChild(asia)
root.putChild(america)

nafrica = Location("North Africa")
africa.putChild(nafrica)

nafrica.putChild(Location("Morrocco"))

obj = root.locateChild("1.1.1")
print obj.name

print root.getScheme("Morrocco")

pyfunc 2010-09-20 18:41:37

Answer 4

A:

This code can be hideous. But, I just want to paste it because I have put some time into it :)

tree = file_to_list_of_tuples(thefile)
d = {}
i = 1
for continent, region, country in tree:
   if continent not in d:
      d[continent] = [i, 0, 0]
      i += 1
   cont_code = d[continent][0]
   if region not in d:
      max_reg_code =  max( [y for x, y, z in d.values() if x==cont_code] )
      d[region] = [cont_code, max_reg_code+1 , 0]
   reg_code = d[region][1]
   if country not in d:
      max_country_code = max( [z for x, y, z in d.values() if x == cont_code and y== reg_code] )
      d[country] = [cont_code, reg_code, max_country_code+1]

def get_code(x):
   print d[x]

get_code will print lists, but you can easily make them print in the format you want.

dheerosaur 2010-09-20 19:20:24

ansaurus

tags:

views:

answers:

Python: tree structure and numerical codes?

related questions