views:

949

answers:

14
+11  Q: 

Project Euler #15

Hey everyone,

Last night I was trying to solve challenge #15 from Project Euler:

Starting in the top left corner of a 2×2 grid, there are 6 routes (without backtracking) to the bottom right corner.

alt text

How many routes are there through a 20×20 grid?

I figured this shouldn't be so hard, so I wrote a basic recursive function:

const int gridSize = 20;

// call with progress(0, 0)
static int progress(int x, int y)
{
    int i = 0;

    if (x < gridSize)
        i += progress(x + 1, y);
    if (y < gridSize)
        i += progress(x, y + 1);

    if (x == gridSize && y == gridSize)
        return 1;

    return i;
}

I verified that it worked for a smaller grids such as 2×2 or 3×3, and then set it to run for a 20×20 grid. Imagine my surprise when, 5 hours later, the program was still happily crunching the numbers, and only about 80% done (based on examining its current position/route in the grid).

Clearly I'm going about this the wrong way. How would you solve this problem? I'm thinking it should be solved using an equation rather than a method like mine, but that's unfortunately not a strong side of mine.

Update:

I now have a working version. Basically it caches results obtained before when a n×m block still remains to be traversed. Here is the code along with some comments:

// the size of our grid
static int gridSize = 20;

// the amount of paths available for a "NxM" block, e.g. "2x2" => 4
static Dictionary<string, long> pathsByBlock = new Dictionary<string, long>();

// calculate the surface of the block to the finish line
static long calcsurface(long x, long y)
{
    return (gridSize - x) * (gridSize - y);
}

// call using progress (0, 0)
static long progress(long x, long y)
{
    // first calculate the surface of the block remaining
    long surface = calcsurface(x, y);
    long i = 0;

    // zero surface means only 1 path remains
    // (we either go only right, or only down)
    if (surface == 0)
        return 1;

    // create a textual representation of the remaining
    // block, for use in the dictionary
    string block = (gridSize - x) + "x" + (gridSize - y);

    // if a same block has not been processed before
    if (!pathsByBlock.ContainsKey(block))
    {
        // calculate it in the right direction
        if (x < gridSize)
            i += progress(x + 1, y);
        // and in the down direction
        if (y < gridSize)
            i += progress(x, y + 1);

        // and cache the result!
        pathsByBlock[block] = i;
    }

    // self-explanatory :)
    return pathsByBlock[block];
}

Calling it 20 times, for grids with size 1×1 through 20×20 produces the following output:

There are 2 paths in a 1 sized grid
0,0110006 seconds

There are 6 paths in a 2 sized grid
0,0030002 seconds

There are 20 paths in a 3 sized grid
0 seconds

There are 70 paths in a 4 sized grid
0 seconds

There are 252 paths in a 5 sized grid
0 seconds

There are 924 paths in a 6 sized grid
0 seconds

There are 3432 paths in a 7 sized grid
0 seconds

There are 12870 paths in a 8 sized grid
0,001 seconds

There are 48620 paths in a 9 sized grid
0,0010001 seconds

There are 184756 paths in a 10 sized grid
0,001 seconds

There are 705432 paths in a 11 sized grid
0 seconds

There are 2704156 paths in a 12 sized grid
0 seconds

There are 10400600 paths in a 13 sized grid
0,001 seconds

There are 40116600 paths in a 14 sized grid
0 seconds

There are 155117520 paths in a 15 sized grid
0 seconds

There are 601080390 paths in a 16 sized grid
0,0010001 seconds

There are 2333606220 paths in a 17 sized grid
0,001 seconds

There are 9075135300 paths in a 18 sized grid
0,001 seconds

There are 35345263800 paths in a 19 sized grid
0,001 seconds

There are 137846528820 paths in a 20 sized grid
0,0010001 seconds

0,0390022 seconds in total

I'm accepting danben's answer, because his helped me find this solution the most. But upvotes also to Tim Goodman and Agos :)

Bonus update:

After reading Eric Lippert's answer, I took another look and rewrote it somewhat. The basic idea is still the same but the caching part has been taken out and put in a separate function, like in Eric's example. The result is some much more elegant looking code.

// the size of our grid
const int gridSize = 20;

// magic.
static Func<A1, A2, R> Memoize<A1, A2, R>(this Func<A1, A2, R> f)
{
    // Return a function which is f with caching.
    var dictionary = new Dictionary<string, R>();
    return (A1 a1, A2 a2) =>
    {
        R r;
        string key = a1 + "x" + a2;
        if (!dictionary.TryGetValue(key, out r))
        {
            // not in cache yet
            r = f(a1, a2);
            dictionary.Add(key, r);
        }
        return r;
    };
}

// calculate the surface of the block to the finish line
static long calcsurface(long x, long y)
{
    return (gridSize - x) * (gridSize - y);
}

// call using progress (0, 0)
static Func<long, long, long> progress = ((Func<long, long, long>)((long x, long y) =>
{
    // first calculate the surface of the block remaining
    long surface = calcsurface(x, y);
    long i = 0;

    // zero surface means only 1 path remains
    // (we either go only right, or only down)
    if (surface == 0)
        return 1;

    // calculate it in the right direction
    if (x < gridSize)
        i += progress(x + 1, y);
    // and in the down direction
    if (y < gridSize)
        i += progress(x, y + 1);

    // self-explanatory :)
    return i;
})).Memoize();

By the way, I couldn't think of a better way to use the two arguments as a key for the dictionary. I googled around a bit, and it seems this is a common solution. Oh well.

A: 

The proper way to do this is with dynamic programming.

Ignacio Vazquez-Abrams
A: 

You can halve the calculation time by taking into account that, once you reduce it to a square, the grid's going to be symmetric. So whenever you have an equal amount of space in the X and Y direction to traverse remaining you can use the same calculation for the increased x travel and increased y travel.

That being said, I did this in python and did a lot of caching of results to avoid recalculating.

StrixVaria
+21  A: 

This can be done much faster if you use dynamic programming (storing the results of subproblems rather than recomputing them). Dynamic programming can be applied to problems that exhibit optimal substructure - this means that an optimal solution can be constructed from optimal solutions to subproblems (credit Wikipedia).

I'd rather not give away the answer, but consider how the number of paths to the lower right corner may be related to the number of paths to adjacent squares.

Also - if you were going to work this out by hand, how would you do it?

danben
Upvoted for calling attention to a very useful programming strategy. Even though for this problem I think "consider the math and then punch `40!/20!/20!` into google" is the quicker way, it's definitely nice to have both the combinatorics approach and the dynamic programming approach in one's mental toolkit.
Tim Goodman
@Tim Goodman: Why would you type all that into Google when you can simply type 40 choose 20? ;)
Aistina
@Aistina: Cool, I didn't know Google understood that syntax
Tim Goodman
A: 

Everyone indicates dynamic programming, and caching results. I have, somewhere, a Ruby script which ended up with a very big hash where all the data was stored. In truth, like most Euler project problems, this is a hidden math 'trick', and there are ways to get the result with a simple calculation.

Trevoke
In this problem, the amount of storage needed for the dynamic programming solution is no larger than the grid being solved for (really, only half that). You were probably doing it wrong.
danben
Haha, actually, I did a brain melt. I was thinking about solving one of the Euler pb that requires tons of fibonacci numbers and storing them all.You are right - this one was very painless.
Trevoke
+18  A: 

Quick No Programming Solution (based on combinatorics)

I take it "no backtracking" means we always either increase x or increase y.

If so, we know that in total we will have 40 steps to reach the finish -- 20 increases in x, 20 increases in y.

The only question is which of the 40 are the 20 increases in x. The problem amounts to: how many different ways can you choose 20 elements out of a set of 40 elements. (The elements are: step 1, step 2, etc. and we're choosing, say, the ones that are increases in x).

There's a formula for this: it's the binomial coefficient with 40 on top and 20 on the bottom. The formula is 40!/((20!)(40-20)!), in other words 40!/(20!)^2. Here ! represents factorial. (e.g., 5! = 5*4*3*2*1)

Canceling out one of the 20! and part of the 40!, this becomes: (40*39*38*37*36*35*34*33*32*31*30*29*28*27*26*25*24*23*22*21)/(20*19*18*17*16*15*14*13*12*11*10*9*8*7*6*5*4*3*2*1). The problem is thus reduced to simple arithmatic. The answer is 137,846,528,820.

For comparison, note that (4*3)/(2*1) gives the answer from their example, 6.

Tim Goodman
@danben: Dude, if I could work out the solution to this in 10 mintues with pencil and paper, I wouldn't put it in bold at top of the answer. It would be the only text in the answer and written in letters of fire, ten feet tall :)
Binary Worrier
It's okay, I'm not going to stop coding until I get my app to output the correct number.
Aistina
@danben: Sorry, I missed the fact that this is kind of a contest. I was just trying to make clear you can really do it immediately without programming. Does StackOverflow have a way to do a spoiler tag or other concealment? If not, I'm not sure how to conceal it while still answering the question clearly . . . even if I take the answer out of the top, you can just copy-paste the formula at the bottom into google. I'm not sure that's much better.
Tim Goodman
By the way, I would hope that anyone reading this would really take the time to make sure they understand the solution. Just copy-pasting someone else's answer without knowing where it comes from is only cheating yourself.
Tim Goodman
An explanation of this can also be found in AoCP Fascicle 3, 7.2.1.3. It's essentially saying, from a list of 2R, 0 and 1s we want exactly R 1's. If you think of 1's representing 'DOWN' and 0's representing 'RIGHT', it's pretty easy to see this method.
nlucaroni
@Tim Goodman - There's no spoiler tag; what I prefer to do in this kind of situation is give as much of a hint as I can without giving away the answer. In this case, I might have described some of the steps leading to the determination to use **choose** without stating it explicitly. The poster can always come back with additional questions (though you then run the risk of someone else coming in and posting the answer, hoping to steal the rep).
danben
Why not link to a blank image with alt text as the answer?
nlucaroni
For now I just took the answer out of the bolded top line, but included it below. That way, people who are just glancing at the answers but don't want the exact value won't be "spoiled" so easily. I figure this is good enough since at this point the question asker now also includes the answer in his post.
Tim Goodman
+3  A: 

While dynamic programming is certainly a correct way to solve this kind of problems, this particular instance shows a regularity that can be exploited.

You can see the problem as arranging a number of “right"s and “down"s, being wary not to count multiple times identical arrangements.
For example, the solutions of the size 2 problem (reported in the images in the question) can be see this way:

→→↓↓  
→↓→↓
→↓↓→
↓→→↓
↓→↓→
↓↓→→

So, for any grid of side n, you can find the solution by means of combinatorics:

from math import factorial
n = 20
print factorial(2*n)/(factorial(n)*factorial(n))

2n! is the number of arrangements of the 20 → + 20↓, while the two n! account for the identical ways in which the → and ↓ can be arranged.

Agos
+1  A: 

This is Pascal's Triangle. The solution is well-known.

John R. Strohm
+1  A: 

Even though dynamic programming looks like an attractive way of approaching the problem (and makes it an interesting coding challenge), a bit of creative thinking about data structures lends itself to giving an immediate answer.

[ the rest of this is essentially an explanation of why Tim Goodman's answer is best, for some value of "best" ]
If we have an nXm grid, we can represent each valid corner-to-corner route as an n+m bit-string, using either 0 or 1 to represent "down". A bit more thinking gives at hand that the exact number of routes is the number of ways to take N items from N+M items and this, conveniently, happens to be the standard simple combinatorial M over N.

So, for any N+M rectangle, the number of possible routes from the top-left to the bottom-right corner is (n+m)(n+m-1)...(m+1)/(n * (n-1) * ... 1).

The fastest program is the one that doesn't have to store very much, use much in the way of storage and (ideally) has a closed-form answer.

Vatine
+17  A: 

As others have noted, there's a discrete math solution to this particular problem. But suppose you did want to solve it recursively. Your performance problem is that you're solving the same problems over and over again.

Let me show you a little higher-order programming trick that will pay big dividends. Let's take an easier recursive problem:

long Fib(n) 
{
    if (n < 2) return 1;
    return Fib(n-1) + Fib(n-2);
}

You ask this to compute Fib(5). That computes Fib(4) and Fib(3). Computing Fib(4) computes Fib(3) and Fib(2). Computing Fib(3) computes Fib(2) and Fib(1). Computing Fib(2) computes Fib(1) and Fib(0). Now we go back and compute Fib(2) again. Then we go back and compute Fib(3) again. Huge amounts of recomputation.

Suppose we cached the results of the computation. Then the second time the computation was requested, we'd just return the cached result. Now comes the higher-order trick. I want to represent this concept of "cache the results of a function" as a function that takes in a function, and returns me a function that has this nice property. I'll write it as an extension method on functions:

static Func<A, R> Memoize(this Func<A, R> f)
{
    // Return a function which is f with caching.
    var dictionary = new Dictionary<A, R>();
    return (A a)=>
    {
        R r;
        if(!dictionary.TryGetValue(a, out r))
        { // cache miss
            r = f(a);
            dictionary.Add(a, r);
        }
        return r;
    };
}

Now we do some minor rewriting on Fib:

Func<long, long> Fib = null;
Fib = (long n) => 
{
    if (n < 2) return 1;
    return Fib(n-1) + Fib(n-2);
};

OK, we have our non-memoized function. Now, magic:

Fib = Fib.Memoize();

And boom, when we call Fib(5), now we do a dictionary lookup. 5 is not in the dictionary, so we call the original function. That calls Fib(4), which does another dictionary lookup and misses. That calls Fib(3), and so on. When we get back to calling Fib(2) and Fib(3) the second time, the results are already in the dictionary, so we do not re-compute them.

Writing a two-argument version:

static Func<A1, A2, R> Memoize(this Func<A1, A2, R>) { ... }

is not too hard and is left as an exercise. If you do that, then you can just take your original beautiful recursive logic, do a simple rewriting into a lambda, and say:

progress = progress.Memoize();

and suddenly your performance will increase, with no loss of readability of the original algorithm.

Eric Lippert
+1 Nicely explained and implemented!
Andrew Hare
Love the implementation of Memoize!
SolutionYogi
+1 Very nice solution and explanation! As you can see in my final edit, this is more or less what I ended up doing myself after some experimentation, albeit in a much less elegant manner.
Aistina
A: 

The solutions are reflected along a diagonal from NW to SE of your grid. So, you should only be calculating solutions for the top right half of the grid, and then reflecting them to get the other half...

Scott Smith
A: 

By the way, you can increase your performance even more by realizing that a 2x3 will have the same amount of paths through it as a 3x2. Your memorize function appears to only account for a string that is exactly columns x rows. However you can include in your memorize to put the total paths for the key of 2x3 as well as 3x2.

So when you memorize 4x2 etc it will automatically fill in 2x4 with the same amount of paths. This will cut your time way down as you have already calculated all paths through that surface area once before so why do it again?

bmucklow
A: 

My solution was niaive but quite easy to understand:

The number of routes to a given intersection on a grid is the sum of the number of routes to its two neighbours.

Given that there is only one route to each of the points on the top and left edges it's quite easy to iterate over the remaining points and fill in the blanks.

for x or y = 0: grid[x,y] = 1
for x and y >=1: grid[x,y] = grid[x-1,y] + grid[x, y-1]

So after iterating over all the squares the final answer is contained in grid[20,20].

Phil
+1  A: 

You are in fact computing Catalan Numbers for which a closed formula using Taylor series is available.

So one program computing the solution could compute binomial coefficients, which are tricky to get right if you don't have a BigInt class...

Alexandre C.
A: 

I believe that some high school mathematics would be useful here, this link explains the combination formula required:

http://mathworld.wolfram.com/Combination.html

Now, using this to find the number of paths through a square grid, the formula becomes 2n choose n. As a warning, you will need to use a datatype which can hold a fairly large number

njak32