views:

710

answers:

4

Hello people.

I'm trying to figure out how to implement a function in a tree node which returns all its descendant leaves (whether direct or indirect). However, I don't want to pass a container in which the leaf nodes will be put recursively (the tree could be huge), instead I'd like to use a generator to iterate through the tree. I've tried a few approaches but none of them worked so far. This one is the closest I've come to a possible solution:

    public interface ITreeNode
    {
        IEnumerable<ITreeNode> EnumerateLeaves();            
    }

    class Leaf : ITreeNode
    {
        public IEnumerable<ITreeNode> EnumerateLeaves()
        {
            throw new NotImplementedException();
        }
    }

    class Branch : ITreeNode
    {
        private List<ITreeNode> m_treeNodes = new List<ITreeNode>();

        public IEnumerable<ITreeNode> EnumerateLeaves()
        {
            foreach( var node in m_treeNodes )
            {
                if( node is Leaf )
                    yield return node;
                else
                    node.EnumerateLeaves();
            }
        }
    }

But this is not working either. What am I doing wrong? Seems like calling .EnumerateLeaves recursively won't work if there's a yield statement in the same function.

Any help would be very much appreciated. Thanks in advance.

Edit: I forgot to mention that a branch can have either leaves or branches as children, hence the recursion.

+7  A: 

Here's how you should implement Branch.EnumerateLeaves:

public IEnumerable<ITreeNode> EnumerateLeaves()
{
    foreach( var node in m_treeNodes )
    {
        if( node is Leaf )
            yield return node;
        else
        {
            foreach (ITreeNode childNode in node.EnumerateLeaves())
                yield return childNode;
        }
    }
}
Lasse V. Karlsen
That won't work either since a Branch could have another braches as children, and that's why I used recursion.
Trap
Let me edit the answer.
Lasse V. Karlsen
There, that should do the trick, now it will use recursion correctly.
Lasse V. Karlsen
Heh - I posted this just as you edited your answer.
Erik Forbes
So basically the solution is to create a new generator every time you go deeper in the recursion?
Trap
Yes, if you use recursion.
Erik Forbes
So that would be O(n log n) iteration over the tree?
Jules
See Erik's answer for more info about performance. As always, profile the code and if it's a bottleneck, refactor it.
Lasse V. Karlsen
O(n^2) actually. See the link at the bottom of my answer for why.
Erik Forbes
I read the link. It's O(n log n). That's because the recursion depth for walking a tree is log n, not n.
recursive
It depends on what n is and how balanced your tree is :) If your tree is deep but not wide, then it will be closer to O(n^2), where n is the number of nodes in the tree.
Daniel Plaisted
O(n^2) worst case I should have said. Apologies.
Erik Forbes
+3  A: 

lassevk is right - one potential issue with that method, however, is that recursively calling into enumerables can yield O(n^2) performance. If this is an issue, then you should instead factor the recursion out and use an internal stack.

public IEnumerable<ITreeNode> EnumerateLeaves()
{
    Stack<ITreeNode> workStack = new Stack<ITreeNode>(m_treeNodes);

    while(workStack.Count > 0) {
        var current = workStack.Pop();
        if(current is Leaf)
            yield return current;
        else {
            for(n = 0; n < current.m_treeNodes.Count; n++) {
                workStack.Push(current.m_treeNodes[n]);
            }
        }
    }
}

This should perform the same function, without the recursion.

Edit: Daniel Plaisted posted in a comment about a bigger problem with recursively calling Enumerators, summed up in a post on the MSDN Blogs regarding iterators. Thanks Daniel. =)

Another Edit: Always remember that code simplicity can be more important than performance, especially if you expect others to maintain your code. If you don't expect your tree to grow very large, use the recursion method lassevk outlined in his answer.

Erik Forbes
The GC isn't the biggest problem with recursion, it's the fact that on each iteration it has to walk down the "tree" of enumerators from the top. This can make the runtime increase dramatically. See http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx
Daniel Plaisted
That's very true - though I couldn't think of how to phrase it. Thanks for the link - very helpful. =)
Erik Forbes
A: 

This is a fantastic question for those of us unused to this style of coding - thanks for posting it!

Mike Burton
Thanks! I'm glad it was useful for someone else :)
Trap
+1  A: 

The other answers are correct, but the pattern of having a yield return inside a foreach loop with a recursive call will make the running time to iterate through the tree something like O(number of nodes * average depth of a node). See this blog post for more details about the problem.

Here is an attempt at a generator which is efficient in both runtime and memory usage:

class Node
{
    List<Node> _children;

    public bool IsLeaf { get { return _children.Count == 0; } }

    public IEnumerable<Node> Children { get { return _children; } }

    public IEnumerable<Node> EnumerateLeaves()
    {
        if (IsLeaf)
        {
            yield return this;
            yield break;
  }

        var path = new Stack<IEnumerator<Node>>();

        path.Push(Children.GetEnumerator());

        while(!path.Empty)
        {
            var cur = path.Pop();
            if (cur.MoveNext())
            {
                path.Push(cur);
                if (cur.IsLeaf)
                {
                    yield return cur;
                }
                else
                {
                    path.Push(cur.Children.GetEnumerator());
                }
            }

        }
    }

}
Daniel Plaisted
Can you explain how this is better than O(n^2)? I'm not seeing it.
Erik Forbes
It's O(n log n). In the link you provided, the recursion depth was n. In a tree, it's log n in an average case.
recursive
I updated it to state that the run time of the recursive method is O(number of nodes * average depth of a node). This one should have a runtime of just O(number of nodes), which is the same as Erik's. This one theoretically has less memory use.
Daniel Plaisted