views:

153

answers:

3

The canonical implementation of length :: [a] -> Int is:

length [] = 0
length (x:xs) = 1 + length xs

which is very beautiful but suffers from stack overflow as it uses linear space.

The tail-recursive version:

length xs = length' xs 0
  where length' [] n = n
        length' (x:xs) n = length xs (n + 1)

doesn't suffer from this problem, but I don't understand how this can run in constant space in a lazy language.

Isn't the runtime accumulating numerous (n + 1) thunks as it moves through the list? Shouldn't this function Haskell to consume O(n) space and lead to stack overflow?

(if it matters, I'm using GHC)

+1  A: 

A tail-recursive function doesn't need to maintain a stack, since the value returned by the function is simply going to be the value returned by the tail call. So instead of creating a new stack frame, the current one gets re-used, with the locals overwritten by the new values passed into the tail call. So every n+1 gets written into the same place where the old n was, and you have constant space usage.

Edit - Actually, as you've written it, you're right, it'll thunk the (n+1)s and cause an overflow. Easy to test, just try length [1..1000000].. You can fix that by forcing it to evaluate it first: length xs $! (n+1), which will then work as I said above.

tzaman
The questioner's code does indeed overflow the stack. Generally speaking, in Haskell tail recursion does **not** do what you expect it to do, and lazy tail-recursive functions are often bugs. See here: http://www.haskell.org/haskellwiki/Foldr_Foldl_Foldl' Rule of thumb: either make it strict like `foldl'`, or make it recurse into a constructor like `foldr`.
camccann
Aw, nuts. Markdown doesn't like apostrophes at the end of a URL, apparently. Let's try that again: http://www.haskell.org/haskellwiki/Foldr_Foldl_Foldl%27
camccann
+10  A: 

Running your second version in GHCi:

> length [1..1000000]
*** Exception: stack overflow

So to answer your question: Yes, it does suffer from that problem, just as you expect.

However, GHC is smarter than the average compiler; if you compile with optimizations turned out, it'll fix the code for you and make it work in constant space.

More generally, there are ways to force strictness at specific points in Haskell code, preventing the building of deeply nested thunks. A usual example is foldl vs. foldl':

len1 = foldl (\x _ -> x + 1) 0
len2 = foldl' (\x _ -> x + 1) 0

Both functions are left folds that do the "same" thing, except that foldl is lazy while foldl' is strict. The result is that len1 dies with a stack overflow in GHCi, while len2 works correctly.

camccann
+10  A: 
Norman Ramsey