views:

314

answers:

3

I've been learning some Haskell, and doing project Euler problems as I go. I'm not really bothered about the answer to the Euler Problem (which I can happily brute force, or do it in some other language), but the method.

I'm stuck on doing Problem 15 in reasonable time. It asks for the number of non-backtracking routes from top-left to the bottom-right of a 20x20 grid. Link here

I'd say it's reasonably obvious that the idea is to memoize (sp?) the function, but I'm not reall sure how to do it. I googled, and came across this on Haskell Cafe that I naively tried to adapt, but ended up with:

memRoute :: (Int,Int) -> Int
memRoute (x,y) = fromJust $ lookup (x,y) $ map (\t -> (t,route t)) [(x,y) | x <- [0..20], y <- [0..20]]

route (0,0) = 0
route (_,0) = 1
route (0,_) = 1
route (x,y) = memRoute (x-1,y) + memRoute (x,y-1)

Which looks bad, and performs horribly (a lot slower than a naive version). The problem is that I don't really understand how Haskell memoization works.

Using my code as an example, would anyone care to explain a) why mine is so slow
b) how it should be done (without using mutables, which was a solution I came across)
c) how the memoization works in this case?

+1  A: 

Toss in a couple of trace points

memRoute :: (Int,Int) -> Int
memRoute (x,y) = trace ("mem: " ++ show (x,y))
                 fromJust $
                 lookup (x,y) $
                 map (\t -> (t,route t))
                 [(x,y) | x <- [0..20], y <- [0..20]]

route (0,0) = 0
route (_,0) = 1
route (0,_) = 1
route (x,y) = trace ("route: " ++ show (x,y))
              memRoute (x-1,y) + memRoute (x,y-1)

to see that you haven't memoized at all.

ghci> memRoute (2,2)
mem: (2,2)
route: (2,2)
mem: (1,2)
route: (1,2)
mem: (0,2)
mem: (1,1)
route: (1,1)
mem: (0,1)
mem: (1,0)
mem: (2,1)
route: (2,1)
mem: (1,1)
route: (1,1)
mem: (0,1)
mem: (1,0)
mem: (2,0)
6

Reusing subcomputations is dynamic programming.

import Data.Array

routes :: (Int,Int) -> Int
routes = (rte !)
  where rte = array ((1,1), (n,n))
                    [ ((x,y), route (x,y)) | x <- [1 .. n],
                                             y <- [1 .. n] ]
        route (1,1) = 0
        route (_,1) = 1
        route (1,_) = 1
        route (x,y) = rte ! (x-1,y) + rte ! (x,y-1)
        n = 20

Note that the algorithm is incorrect, but at least it's easy to get a bad answer quickly!

Greg Bacon
+4  A: 

In my opinion, "memoization" is highly overrated. There is no one-size-fits-all memoization technique (in any programming language) that turns every single-case calculation into a general calculation. You have to understand each problem, and organize things to control the amount of memory you need to use.

In this case, to get the number of paths for an n x m rectangle, you don't need to remember the totals for all smaller rectangles, just for the rectangles that are one step smaller in either direction. So you can build up row by row, forgetting all of the totals for smaller rectangles as you go.

In Haskell, laziness is perfect for this kind of situation. It relieves you of all the work of keeping of track of exactly what to remember and what to forget - just compute an infinite grid of the totals for all possible rectangles, and let Haskell do the rest of the work.

For zero rows, you have only the bottom line. No matter how long it is, there is only a single path to the end, so the numbers of paths are repeat 1.

To compute a row in the grid from the previous row, you start with 1 (only one path straight down, no matter how high you are), then at each step you add together the corresponding entry in the previous row and the previous step in the current row. Altogether, we have:

iterate (scanl (+) 1 . tail) (repeat 1) !! 20 !! 20

That computes the answer instantaneously for me at the GHCi prompt.

Yitz
Whilst I appreciate the elegance of your answer, (and I certainly hadn't thought of doing it like that), I was really trying to use this Euler problem as a vehicle for getting a better grip on memoization in Haskell, rather than solve the problem itself.
MrBones
That's exactly my point. This *is* one way to do memoization in Haskell. When you create a lazy data structure, Haskell automatically memoizes the parts of it that are actually computed until they are no longer needed, then it garbage collects them.
Yitz
I'll add that if what we were looking for were the best solution to the Project Euler problem, even this simpler kind of memoization is the wrong approach. It's not hard to prove that `numPaths a b == comb (a+b) b`, where `comb` is the usual combinations function from combinatorics. In practice, thinking of "memoization" as a programming technique is a bad idea in any programming language. It takes the focus of your thinking away from the task at hand. Use your brain and solve the problem, and let memory usage happen naturally.
Yitz
@MrBones: In fact, this is exactly what people usually mean by "memoization" in Haskell: Build an infinite lazy data structure that contains the result for every possible combination of arguments, replacing recursive calls with indexing into the data structure itself. The standard memoizing list example (e.g., fibonacci) is just the special case of a function with one, non-negative integer argument. Yitz's algorithm is likewise a special case for the particular recursive structure of this problem.
camccann
A: 

What happens is that your memoization table is recomputed on every call to memRoute. Not in its entirity (fortunately!) but at least the work done in one call is not shared with the rest. The simplest rewrite I could come up with was:

memRoute = let table = map (\t -> (t,route t)) [(x,y) | x <- [0..20], y <- [0..20]]
           in  \(x,y) -> fromJust $ lookup (x,y) $ table

That stays really close to your expressed intent, but I think there are better ways of doing memoization, by using a Data.Map and letting the actual call pattern determine what values are memoized.

yatima2975