views:

115

answers:

1

I have a function

myLength = foldl (\ x _ -> x + 1) 0

which fails with stack overflow with input around 10^6 elements (myLength [1..1000000] fails). I believe that is due to the thunk build up since when I replace foldl with foldl', it works. So far so good.

But now I have another function to reverse a list :

myReverse = foldl (\ acc x -> x : acc) []

which uses the lazy version foldl (instead of foldl')

When I do myLength . myReverse $ [1..1000000]. This time it works fine. I fail to understand why foldl works for the later case and not for former?

To clarify here myLength uses foldl' while myReverse uses foldl

+2  A: 

Here's my best guess, though I'm no expert on Haskell internals (yet).

While building the thunk, Haskell allocates all the intermediate accumulator variables on the heap.

When performing the addition as in myLength, it needs to use the stack for intermediate variables. See this page. Excerpt:

The problem starts when we finally evaluate z1000000:

Note that z1000000 = z999999 + 1000000. So 1000000 is pushed on the stack. Then z999999 is evaluated.

Note that z999999 = z999998 + 999999. So 999999 is pushed on the stack. Then z999998 is evaluated:

Note that z999998 = z999997 + 999998. So 999998 is pushed on the stack. Then z999997 is evaluated:

However, when performing list construction, here's what I think happens (this is where the guesswork begins):

When evaluating z1000000:

Note that z1000000 = 1000000 : z999999. So 1000000 is stored inside z1000000, along with a link (pointer) to z999999. Then z999999 is evaluated.

Note that z999999 = 999999 : z999998. So 999999 is stored inside z999999, along with a link to z999998. Then z999998 is evaluated.

etc.

Note that z999999, z999998 etc. changing from a not-yet-evaluated expression into a single list item is an everyday Haskell thing :)

Since z1000000, z999999, z999998, etc. are all on the heap, these operations don't use any stack space. QED.

Artelius
Actually, both arguments to `(:)` are stored as pointers, not just the tail. In other words `(+)` is strict in both it arguments (they need to be fully evaluated), but `(:)` is lazy in its arguments (they can be thunks).
Tom Lokhorst
That says it nicely.
Artelius
Thanks a lot!!! Any pointers/links to understand thunks (lazy eval) better.
Abhinav Kaushik
Hmm, two downvotes but nobody posts a better answer?
Artelius