views:

271

answers:

2

I have some code where instances of classes have parent<->child references to each other, e.g.:

class Node(object):
  def __init__(self):
    self.parent = None
    self.children = {}
  def AddChild(self, name, child):
    child.parent = self
    self.children[name] = child

def Run():
  root, c1, c2 = Node(), Node(), Node()
  root.AddChild("first", c1)
  root.AddChild("second", c2)
Run()

I think this creates circular references such that root, c1 and c2 won't be freed after Run() is completed, right?. So, how do get them to be freed? I think I can do something like root.children.clear(), or self.parent = None - but what if I don't know when to do that?

Is this an appropriate time to use the weakref module? What, exactly, do I weakref'ify? the parent attribute? The children attribute? The whole object? All of the above? I see talk about the WeakKeyDictionary and weakref.proxy, but its not clear to me how they should be used, if at all, in this case.

This is also on python2.4 (can't upgrade).

Update: Example and Summary

What objects to weakref-ify depends on which object can live without the other, and what objects depend on each other. The object that lives the longest should contain weakrefs to the shorter-lived objects. Similarly, weakrefs should not be made to dependencies - if they are, the dependency could silently disappear even though it is still needed.

If, for example, you have a tree structure, root, that has children, kids, but can exist without children, then the root object should use weakrefs for its kids. This is also the case if the child object depends on the existence of the parent object. Below, the child object requires a parent in order to compute its depth, hence the strong-ref for parent. The members of the kids attribute are optional, though, so weakrefs are used to prevent a circular reference.

class Node:
  def __init__(self)
    self.parent = None
    self.kids = weakref.WeakValueDictionary()
  def GetDepth(self):
    root, depth = self, 0
    while root:
      depth += 1
      root = root.parent
    return count
root = Node()
root.kids["one"] = Node()
root.kids["two"] = Node()
# do what you will with root or sub-trees of it.

To flip the relationship around, we have something like the below. Here, the Facade classes require a Subsystem instance to work, so they use a strong-ref to the subsystem they need. Subsystems, however, don't require a Facade to work. Subsystems just provide a way to notify Facades about each other's actions.

class Facade:
  def __init__(self, subsystem)
    self.subsystem = subsystem
    subsystem.Register(self)

class Subsystem:
  def __init__(self):
    self.notify = []
  def Register(self, who):
    self.notify.append(weakref.proxy(who))

sub = Subsystem()
f1 = CliFacade(sub)
f2 = WebFacade(sub)
# Go on to reading from POST, stdin, etc
+8  A: 

Yep, weakref's excellent here. Specifically, instead of:

self.children = {}

use:

self.children = weakref.WeakValueDictionary()

Nothing else needs change in your code. This way, when a child has no other differences, it just goes away -- and so does the entry in the parent's children map that has that child as the value.

Avoiding reference loops is up high on a par with implementing caches as a motivation for using the weakref module. Ref loops won't kill you, but they may end up clogging your memory, esp. if some of the classes whose instances are involved in them define __del__, since that interferes with the gc's module ability to dissolve those loops.

Alex Martelli
Also, if you are sure you won't be needing the cyclic gc, you can disable it for a small performance increase.
gnibbler
Thanks, Alex. Is there a specific reason to weakref `children` rather than `parent`? Would the effect be the same? What would happen if `parent` was weakref'd, too? In the case of a double-linked list, should `prev`, `next`, or both be weakrefs?
Richard Levasseur
This is bad suggestion. All children in the example will be destructed just after return from `Run()`. In general you almost always bind a root of structure to variable, so the right way is to use `weakref` for `parent`, but not `children`.
Denis Otkidach
I also found this message which recommends weakrefs towards the root rather than leaves: http://74.125.155.132/search?q=cache:http://mail.python.org/pipermail/python-list/2009-March/705913.html
Richard Levasseur
The point is not "parent" or "child" but "which one can exist without the other". In this toy example there's little indication, except that we _know_ that the parent _can_ exist without children (because it starts that way!) but we don't know if the converse is true -- maybe the children intrinsically need some services from the parent, and if so they should normal-reference AKA strong-reference it. @Denis, of course everything (parent AND children) gets destroyed at end of Run in this toy example -- d'oh: that's what **makes** it toy, and doesn't invalidate my suggestion.
Alex Martelli
Thanks. I added an example/summary that combines yours and Denis's answers. I also marked it as community wiki in case I've misunderstood something.
Richard Levasseur
+4  A: 

I suggest using child.parent = weakref.proxy(self). This is good solution to avoid circular references in case when lifetime of (external references to) parent covers lifetime of child. Contrary, use weakref for child (as Alex suggested) when lifetime of child covers lifetime of parent. But never use weakref when both parent and child can be alive without other.

Here these rules are illustrated with examples. Use weakref-ed parent if you store root in some variable and pass it around, while children are accessed from it:

def Run():
  root, c1, c2 = Node(), Node(), Node()
  root.AddChild("first", c1)
  root.AddChild("second", c2)
  return root # Note that only root refers to c1 and c2 after return, 
              # so this references should be strong

Use weakref-ed children if you bind all them to variables, while root is accessed through them:

def Run():
  root, c1, c2 = Node(), Node(), Node()
  root.AddChild("first", c1)
  root.AddChild("second", c2)
  return c1, c2

But neither will work for the following:

def Run():
  root, c1, c2 = Node(), Node(), Node()
  root.AddChild("first", c1)
  root.AddChild("second", c2)
  return c1
Denis Otkidach
There's plenty of cases in which either or both of child and parent can be alive without the other _as long as other entities yet hold references to them_; that's when you might want to use mutually weak references (those other external references will do the job of keeping the entities alive exactly as long as needed).
Alex Martelli