views:

171

answers:

4

I've been getting RuntimeError: maximum recursion depth exceeded when trying to pickle a highly-recursive tree object. Much like this asker here.

He solved his problem by setting the recursion limit higher with sys.setrecursionlimit. But I don't want to do that: I think that's more of a workaround than a solution. Because I want to be able to pickle my trees even if they have 10,000 nodes in them. (It currently fails at around 200.)

(Also, every platform's true recursion limit is different, and I would really like to avoid opening this can of worms.)

Is there any way to solve this at the fundamental level? If only the pickle module would pickle using a loop instead of recursion, I wouldn't have had this problem. Maybe someone has an idea how I can cause something like this to happen, without rewriting the pickle module?

Any other idea how I can solve this problem will be appreciated.

A: 

I suppose that most people never use recursive structures of such depth. Since easiest serialization implementations are recursive, you're going to only see them.

If I were you I'd not use an openly recursive data structure here. Instead I'd number every node and use a link table that efficiently translates a number to a node with that number. Every node would refer to other nodes (e.g. its children) via that table, using numbers. A simple property would make this syntactically easy. Beyond that properties, no code dealing with tree traversal would have to change. Node constructor will have to allocate a number and put itself into the link table, which is trivial, too.

The link table might be just a list of nodes, where index into the list serves as the node number; Python lists seem to have efficient access by index. If speed of inserts is important, I'd preallocate a long enough list filled with None; it would not take too much space. If nodes stored their own numbers, this structure would be cheaply traversable in both directions.

As you see, pickling and unpickling such a tree would be trivial at any depth.

9000
So you're saying, avoid having pointers from a node to its children and parent. It would indeed solve the problem, but it will be really annoying not to have pointers. That's compromising on the program's data architecture just because of `pickle`'s problematic implementation.
cool-RR
Not exactly. This approach will have the same _interface_ as if the pointers were simple python references. It's a matter of a simple property definition, with 'get' operation being fairly efficient.
9000
+1  A: 

To make understanding easier, here's a complete example, with only one link to simplify it:

linker = [] 
#
class Node(object):
  def __init__(self, payload):
    global linker
    self.payload = payload
    self.__next = None
    self.__index = len(linker)
    linker.append(self)
  #
  def getNext(self):
    global linker
    if self.__next is not None:
      return linker[self.__next]
  #
  def setNext(self, another):
    if another is not None:
      self.__next = another.__index
    else:
      self.__next = None
  #
  next = property(getNext, setNext)
  #
  def __str__(self):
    return repr(self.payload)


a = Node("One")
b = Node("Two")
c = Node("Three")

b.next = c
a.next = b

# prints "One" "Two" "Three"
print a, a.next, a.next.next
9000
Thanks. I still feel it's too hacky though.
cool-RR
A: 

Just dont use recursion. Make a stack (list / queue) with open nodes and process this.

Something like this (pseudo code)

stack.add(root)
while not list.empty:
    current = stack.pop
    // process current
    for each child of current:
        stack.add(child)

That should do it

Mene
A: 

I think a good solution is a combination of Mene's and 9000's answers. Given that nodes have globally unique IDs (maybe somehow memory addresses can be used as such) you can do this. Granted this is a sloppy pseudo-implementation, but with a bit of abstraction if encapsulated in a tree class it could be very simple.

def all_nodes(node): # walk the tree and get return all nodes as a list
    if node:
        nodes = []
        for child in node.children:
            for sub_child in all_nodes(child):
                nodes.append(sub_child)
        return nodes
    return []


class Node(object):
    def __init__(self, children, id):
        self.children = children
        self.id = id

    def __getstate__(self): #when pickling translate children into IDs
        tmp = self.__dict__.copy()
        children_ids = []
        for child in tmp['children']:
            children_ids.append(child.id)
        tmp['children_ids'] = children_ids
        return tmp


lookup = dict()


for node in all_nodes(rootNode): # put all nodes into a dictionary
    lookup[node.id] = node
#then pickle the dictionary
#then you can unpickle it and walk the dictionary
for id, node in lookup:
    del node.children
    node.children = []
    for child in node.children_ids:
        node.children.append(lookup[child])
#and three should now be rebuilt
Novikov