views:

214

answers:

3

Here is the problem I am trying to solve, (I have simplified the actual problem, but this should give you all the relevant information). I have a hierarchy like so:

1.A
1.B
1.C
2.A
3.D
4.B
5.F

(This is hard to illustrate - each number is the parent, each letter is the child).

  1. Creating an instance of the 'letter' objects is expensive (IO, database costs, etc), so should only be done once.

  2. The hierarchy needs to be easy to navigate.

  3. Children in the hierarchy need to have just one parent.

  4. Modifying the contents of the letter objects should be possible directly from the objects in the hierarchy.

  5. There needs to be a central store containing all of the 'letter' objects (and only those in the hierarchy).

  6. 'letter' and 'number' objects need to be possible to create from a constructor (such as Letter(**kwargs) ).

  7. It is perfectably acceptable to expect that when a letter changes from the hierarchy, all other letters will respect the same change.

Hope this isn't too abstract to illustrate the problem.

What would be the best way of solving this? (Then I'll post my solution)

Here's an example script:

one = Number('one')
a = Letter('a')
one.addChild(a)
two = Number('two')
a = Letter('a')
two.addChild(a)

for child in one:
    child.method1()
for child in two:
    print '%s' % child.method2()
A: 

A basic approach will use builtin data types. If I get your drift, the Letter object should be created by a factory with a dict cache to keep previously generated Letter objects. The factory will create only one Letter object for each key.

A Number object can be a sub-class of list that will hold the Letter objects, so that append() can be used to add a child. A list is easy to navigate.

A crude outline of a caching factory:

>>> class Letters(object):
...     def __init__(self):
...      self.cache = {}
...     def create(self, v):
...      l = self.cache.get(v, None)
...      if l:
...       return l
...      l = self.cache[v] = Letter(v)
...      return l
>>> factory=Letters()
>>> factory.cache
{}
>>> factory.create('a')
<__main__.Letter object at 0x00EF2950>
>>> factory.create('a')
<__main__.Letter object at 0x00EF2950>
>>>

To fulfill requirement 6 (constructor), here is a more contrived example, using __new__, of a caching constructor. This is similar to Recipe 413717: Caching object creation .

class Letter(object):
    cache = {}

    def __new__(cls, v):
        o = cls.cache.get(v, None)
        if o:
            return o
        else:
            o = cls.cache[v] = object.__new__(cls)
            return o

    def __init__(self, v):
        self.v = v
        self.refcount = 0

    def addAsChild(self, chain):
        if self.refcount > 0:
            return False
        self.refcount += 1
        chain.append(self)
        return True

Testing the cache functionality

>>> l1 = Letter('a')
>>> l2 = Letter('a')
>>> l1 is l2
True
>>>

For enforcing a single parent, you'll need a method on Letter objects (not Number) - with a reference counter. When called to perform the addition it will refuse addition if the counter is greater than zero.

l1.addAsChild(num4)
gimel
This is a good start, but it doesn't satisfy 3 or 6.
Dan
for 6, you can easily hide a factory behind any callable. Are you sure about the constructor requirement?
gimel
Cool, that's more like how I've ended up implementing it, so sounds like I'm on the right lines. One thing I found though - the __init__ is always executed after the __new__ (regardless if it returns an existing object), which is fine in this instance, but what if the __init__ does a lot more?
Dan
Have __init__ check for presence of a flag - how about refcount?
gimel
A: 

Creating an instance of the 'letter' objects is expensive (IO, database costs, etc), so should only be done once.

This is where I'd begin. It seems like it would be easy to get out of the way and would give you a lot more freedom to implement the last 6.

Perhaps you could consider a solution like memcached?

Jason Baker
A: 

I wonder if you have made this problem too abstract for people to give useful answers. It seems to me that you might be doing something with genetic codes, and if that is the case, you should look into BioPython. Not only does it come with libraries for manipulating genetic sequences and accessing gene data repositories, but there is also a community of biology people using it who can advise on concrete best practices, knowing the data that is used and the manipulations that people want to do.

Michael Dillon