I would recommend, assuming you can count on there being no duplication among the names, something like:
class Node(object):
byname = {}
def __init__(self, name, parent=None):
self.name = name
self.parent = parent
self.children = []
self.byname[name] = self
if parent is None: # root pseudo-node
self.code = 0
else: # all normal nodes
self.parent.children.append(self)
self.code = len(self.parent.children)
def get_codes(self, codelist):
if self.code:
codelist.append(str(self.code))
self.parent.get_codes(codelist)
root = Node('')
def get_code(nodename):
node = Node.byname.get(nodename)
if node is None: return ''
codes = []
node.get_codes(codes)
codes.reverse()
return '.'.join(codes)
Do you also want to see the Python code for how to add a node given a hierarchical sequence of names, such as ['Africa', 'North Africa', 'Morocco']
? I hope it would be pretty clear given the above structure, so you might want to do it yourself as an exercise, but of course do ask if you'd rather see a solution instead;-).
Getting the hierarchical sequence of names from a text line (string) depends on what the separators are -- in your example it looks like it's just a bunch of spaces added for purely aesthetic reasons connected with lining up the columns (if that's the case I'd recommend a simple re
based approach to split on sequence of two+ spaces), but if it's actually (e.g.) tab characters as the separators, the csv
module from Python's standard library would serve you better. I just can't tell from the short example you posted in your Q!-)
Edit: the OP says they can get the sequence of names just fine but would like to see the code to add the relevant nodes from those -- so, here goes!-)
def addnodes(names):
parent = root
for name in names:
newnode = Node.byname.get(name)
if newnode is None:
newnode = Node(name, parent)
parent = newnode
See why it's important that node names are unique, to make the above class work? Since Node.byname
is a single per-class dict
, it can record only one "corresponding node" for each given name -- thus, a name that's duplicated in two or more places in the hierarchy would "clash" and only one of the two or more nodes would be properly recorded.
But then again, the function get_code
which the OP says is the main reason for this whole apparatus couldn't work as desired if a name could be ambiguous, since the OP's specs mandate it returning only one string. So, some geographical list like
America United States Georgia
Europe Eastern Europe Georgia
(where two completely unrelated areas just happen to be both named 'Georgia'
-- just the kind of thing that unfortunately often happens in real-world geography, as the above example shows!-) would destroy the whole scheme (depending on how the specs for get_code
happen to be altered to deal with an ambiguous-name argument, of course, the class structure could surely be altered accordingly and accomodate the new, drastically different specs!).
The nice thing about encapsulating these design decisions in a class (albeit in this case with a couple of accompanying functions -- they could be elegantly be made into class methods, of course, but the OP's specs rigidly demand that get_code
be a function, so I decided that, in that case addnodes
might as well also be one!-) is that the specific design decisions are mostly hidden from the rest of the code and thus can easily be altered (as long as specs never change, of course -- that's why it's so crucial to spend time and attention defining one's API specs, much more than on any other part of design and coding!-) to refactor the internal behavior (e.g. for optimization, ease of debugging/testing, and so on) while maintaining API-specified semantics intact, and thus leaving all other parts of the application pristine (not even needing re-testing, actually, as long of course as the parts that implement the API are very thoroughly unit-tested -- not hard to do, since they're nicely isolated and stand-alone!-).