tags:

views:

799

answers:

5

I have an object graph wherein each child object contains a property that refers back to its parent. Are there any good strategies for ignoring the parent references in order to avoid infinite recursion? I have considered adding a special [Parent] attribute to these properties or using a special naming convention, but perhaps there is a better way.

+21  A: 

If the loops can be generalised (you can have any number of elements making up the loop), you can keep track of objects you've seen already in a HashSet and stop if the object is already in the set when you visit it. Or add a flag to the objects which you set when you visit it (but you then have to go back & unset all the flags when you're done, and the graph can only be traversed by a single thread at a time).

Alternatively, if the loops will only be back to the parent, you can keep a reference to the parent and not loop on properties that refer back to it.

For simplicity, if you know the parent reference will have a certain name, you could just not loop on that property :)

thecoop
My final solution was to pass a "parent" reference with each recursive call, then to skip any property referencing the parent. Works great!
nw
@nw: what if you have node A with child B, node B with child C and node C with child A? Passing just the parent reference doesn't always work.
Eric Lippert
@Eric Lippert: Good point. In my case the graph is more or less hierarchical, so the simpler solution will suffice.
nw
+2  A: 

If you're doing a graph traversal, you can have a "visited" flag on each node. This ensures that you don't revisit a node and possibly get stuck in an infinite loop. I believe this is the standard way of performing a graph traversal.

Vivin Paliath
Unless the graph is shared by many thread. In that case you need to keep the information somewhere in the current thread.
Victor Hurdugaci
In fact it is worse than @Victor's scenario. Suppose you want to do a query on the cartesian product of the graph *with itself*. The naive way to do that is to traverse the graph, and then, for each element in the traversal, start up a fresh traversal. Now you need to be able to traverse the same graph twice at the same time *in the same thread*. Setting a flag is a standard way of doing this, yes, but it is also a bad idea. Traversing a graph should not mutate it.
Eric Lippert
Those are special cases. I just provided a general solution that should work for the most common case: one thread traversing a graph *once*. More complicated situations will require other solutions.
Vivin Paliath
I note also that this solution consumes O(n) extra memory that survives the lifetime of the graph, whereas solutions which use external memory use O(n) extra memory that is deallocated after the traversal.
Eric Lippert
+2  A: 

I'm not exactly sure what you are trying to do here but you could just maintain a hashtable with all previously visited nodes when you are doing your breadth first search of depth first search.

ruibm
+7  A: 

What a coincidence; this is the topic of my blog this coming Monday. See it for more details. Until then, here's some code to give you an idea of how to do this:

static IEnumerable<T> Traversal<T>(
    T item,
    Func<T, IEnumerable<T>> children)
{
    var seen = new HashSet<T>();
    var stack = new Stack<T>();
    seen.Add(item);
    stack.Push(item); 
    yield return item;
    while(stack.Count > 0)
    {
        T current = stack.Pop();
        foreach(T newItem in children(current))
        {
            if (!seen.Contains(newItem))
            {
                seen.Add(newItem);
                stack.Push(newItem);
                yield return newItem;
            }
        }
    } 
}

The method takes two things: an item, and a relation that produces the set of everything that is adjacent to the item. It produces a depth-first traversal of the transitive and reflexive closure of the adjacency relation on the item. Let the number of items in the graph be n, and the maximum depth be 1 <= d <= n, assuming the branching factor is not bounded. This algorithm uses an explicit stack rather than recursion because (1) recursion in this case turns what should be an O(n) algorithm into O(nd), which is then something between O(n) and O(n^2), and (2) excessive recursion can blow the stack if the d is more than a few hundred nodes.

Note that the peak memory usage of this algorithm is of course O(n + d) = O(n).

So, for example:

foreach(Node node in Traversal(myGraph.Root, n => n.Children))
  Console.WriteLine(node.Name);

Make sense?

Eric Lippert
Out of interest, whats the general limit for the number of stack frames you can have before a StackOverflowException is thrown?
thecoop
@thecoop: You get by default a million bytes of stack per thread. How many bytes a given activation takes up depends on details like "how big is a reference?" (64 bits on 64 bit machines, 32 bit on 32 bit machines...) so it is hard to give a precise statement of the limits. I would not feel comfortable with a recursion more than a few hundred deep.
Eric Lippert
+1  A: 

This is a common problem, but the best approach depends on the scenario. An additional problem is that in many cases it isn't a problem visiting the same object twice - that doesn't imply recursion - for example, consider the tree:

A
=> B
   => C
=> D
   => C

This may be valid (think XmlSerializer, which would simply write the C instance out twice), so it is often necessary to push/pop objects on a stack to check for true recursion. The last time I implemented a "visitor", I kept a "depth" counter, and only enabled the stack checking beyond a certain threshold - that means that most trees simply end up doing some ++/--, but nothing more expensive. You can see the approach I took here.

Marc Gravell