views:

450

answers:

6

I've been trying to find an answer to this question for a few hours now on the web and on this site, and I'm not quite there.

I understand that .NET allocates 1MB to apps, and that it's best to avoid stack overflow by recoding instead of forcing stack size.

I'm working on a "shortest path" app that works great up to about 3000 nodes, at which point it overflows. Here's the method that causes problems:

    public void findShortestPath(int current, int end, int currentCost)
    {
        if (!weight.ContainsKey(current))
        {
            weight.Add(current, currentCost);
        }
        Node currentNode = graph[current];
        var sortedEdges = (from entry in currentNode.edges orderby entry.Value ascending select entry);
        foreach (KeyValuePair<int, int> nextNode in sortedEdges)
        {
            if (!visited.ContainsKey(nextNode.Key) || !visited[nextNode.Key])
            {
                int nextNodeCost = currentCost + nextNode.Value;
                if (!weight.ContainsKey(nextNode.Key))
                {
                    weight.Add(nextNode.Key, nextNodeCost);
                }
                else if (weight[nextNode.Key] > nextNodeCost)
                {
                    weight[nextNode.Key] = nextNodeCost;
                }
            }
        }
        visited.Add(current, true);
        foreach (KeyValuePair<int, int> nextNode in sortedEdges)
        {
            if(!visited.ContainsKey(nextNode.Key) || !visited[nextNode.Key]){
                findShortestPath(nextNode.Key, end, weight[nextNode.Key]);
            }
        }
    }//findShortestPath

For reference, the Node class has one member:

 public Dictionary<int, int> edges = new Dictionary<int, int>();

graph[] is:

  private Dictionary<int, Node> graph = new Dictonary<int, Node>();

I've tried to opimize the code so that it isn't carrying any more baggage than needed from one iteration (recursion?) to the next, but with a 100K-Node graph with each node having between 1-9 edges it's going to hit that 1MB limit pretty quickly.

Anyway, I'm new to C# and code optimization, if anyone could give me some pointers (not like this) I would appreciate it.

+3  A: 

Is your Node a struct or a class? If it's the former, make it a class so that it's allocated on the heap instead of on the stack.

Paul Betts
All that will do is make the stack usage smaller, not eliminate it. The problem of deep recusion on large data structures will still be present.
Eric Lippert
That's true - at first I saw the number as only 3000 and thought that was somewhat small considering the 1MB of stack space, but if you do the math, it seems pretty reasonable.
Paul Betts
+15  A: 

The classic technique to avoid deep recursive stack dives is to simply avoid recursion by writing the algorithm iteratively and managing your own "stack" with an appropriate list data structure. Most likely you will need this approach here given the sheer size of your input set.

bobbymcr
Or make sure the recursive call can be optimized using tail recursion.
LBushkin
+6  A: 

You could convert the code to use a 'work queue' rather than being recursive. Something along the following pseudocode:

Queue<Task> work;
while( work.Count != 0 )
{
     Task t = work.Dequeue();
     ... whatever
     foreach(Task more in t.MoreTasks)
         work.Enqueue(more);
}

I know that is cryptic but it's the basic concept of what you'll need to do. Since your only getting 3000 nodes with your current code, you will at best get to 12~15k without any parameters. So you need to kill the recursion completely.

csharptest.net
Good point. In fact, your code is essentially a breadth first traversal of the nodes as opposed to a depth first approach from the OP.
siz
A: 

I would first make sure I know why I'm getting a stack overflow. Is it actually because of the recursion? The recursive method isn't putting much onto the stack. Maybe it's because of the storage of the nodes?

Also, BTW, I don't see the end parameter ever changing. That suggests it doesn't need to be a parameter, carried on each stack frame.

John Saunders
+6  A: 

A while back I explored this problem in my blog. Or, rather, I explored a related problem: how do you find the depth of a binary tree without using recursion? A recursive tree depth solution is trivial, but blows the stack if the tree is highly imbalanced.

My recommendation is to study ways of solving this simpler problem, and then decide which of them, if any, could be adapted to your slightly more complex algorithm.

Note that in these articles the examples are given entirely in JScript. However, it should not be difficult to adapt them to C#.

Here we start by defining the problem.

http://blogs.msdn.com/ericlippert/archive/2005/07/27/recursion-part-one-recursive-data-structures-and-functions.aspx

The first attempt at a solution is the classic technique that you'll probably adopt: define an explicit stack; use it rather than relying upon the operating system and compiler implementing the stack for you. This is what most people do when faced with this problem.

http://blogs.msdn.com/ericlippert/archive/2005/08/01/recursion-part-two-unrolling-a-recursive-function-with-an-explicit-stack.aspx

The problem with that solution is that it's a bit of a mess. We can go even farther than simply making our own stack. We can make our own little domain-specific virtual machine that has its own heap-allocated stack, and then solve the problem by writing a program that targets that machine! This is actually easier than it sounds; the operations of the machine can be extremely high level.

http://blogs.msdn.com/ericlippert/archive/2005/08/04/recursion-part-three-building-a-dispatch-engine.aspx

And finally, if you are really a glutton for punishment (or a compiler developer) you can rewrite your program in Continuation Passing Style, thereby eliminating the need for a stack at all:

http://blogs.msdn.com/ericlippert/archive/2005/08/08/recursion-part-four-continuation-passing-style.aspx

http://blogs.msdn.com/ericlippert/archive/2005/08/11/recursion-part-five-more-on-cps.aspx

http://blogs.msdn.com/ericlippert/archive/2005/08/15/recursion-part-six-making-cps-work.aspx

CPS is a particularly clever way of moving the implicit stack data structure off the system stack and onto the heap by encoding it in the relationships between a bunch of delegates.

Here are all of my articles on recursion:

http://blogs.msdn.com/ericlippert/archive/tags/Recursion/default.aspx

Eric Lippert
Or ... you could make sure your algorithm can be converted into tail recursive form by the CLR.
LBushkin
C# does not ever generate the tailcall instruction. Certain versions of the jitter will notice that a particular method can be optimized using tail recursion even if the tailcall instruction is not used. However, most jitters that we've released do not have this optimization, and you should not rely upon it.
Eric Lippert
And besides, your advice is not actually actionable. *How* exactly is the original poster supposed to "make sure the algorithm can be converted into a tail recursive form"? This is a complex process that requires deep understanding of implementation details of the runtime, understanding that I don't expect anyone other than the guys in building 42 to possess.
Eric Lippert
I admit, aside from rewriting in F# (or IL) it may not be easy to achieve this reliably in C# - particularly since only the 64-bit CLR seems to perform tailcall optimization with the .tailcall instruction missing.
LBushkin
ALthough, having said this, I wonder if you couldn't emit the necessary IL directly via the CodeDom in C#...
LBushkin
Given that CodeDom doesn't emit IL, but rather C#/VB/... (depending on the provider), that would be quite a feat!
Pavel Minaev
A: 

I would first verify that you are actually overflowing the stack: you actually see a StackOverflowException get thrown by the runtime.

If this is indeed the case, you have a few options:

  1. Modify your recursive function so that the .NET runtime can convert it into a tail-recursive function.
  2. Modify your recursive function so that it is iterative and uses a custom data structure rather than the managed stack.

Option 1 is not always possible, and assumes that the rules the CLR uses to generate tail recursive calls will remain stable in the future. The primary benefit, is that when possible, tail recursion is actually a convenient way of writing recursive algorithms without sacrificing clarity.

Option 2 is a more work, but is not sensitive to the implementation of the CLR and can be implemented for any recursive algorithm (where tail recursion may not always be possible). Generally, you need to capture and pass state information between iterations of some loop, together with information on how to "unroll" the data structure that takes the places of the stack (typically a List<> or Stack<>). One way of unrolling recursion into iteration is through continuation passing pattern.

More resources on C# tail recursion:

http://stackoverflow.com/questions/491376/why-doesnt-net-c-eliminate-tail-recursion

http://geekswithblogs.net/jwhitehorn/archive/2007/06/06/113060.aspx

LBushkin