views:

844

answers:

4

I'm aware that fold-left produces left-leaning trees and fold-right produces right-leaning trees, but when I reach for a fold, I sometimes find myself getting bogged down in headache-inducing thought trying to determine which kind of fold is appropriate. I usually end up unwinding the entire problem and stepping through the implementation of the fold function as it applies to my problem.

So what I want to know is:

  • What are some rules of thumb for determining whether to fold left or fold right?
  • How can I quickly decide which type of fold to use given the problem I'm facing?

There is an example in Scala by Example (PDF) of using a fold to write a function called flatten which concatenates a list of element lists into a single list. In that case, a right fold is the proper choice (given the way the lists are concatenated), but I had to think about it a bit to arrive at that conclusion.

Since folding is such a common action in (functional) programming, I'd like to be able to make these kinds of decisions quickly and confidently. So... any tips?

+9  A: 

You can transfer a fold into an infix operator notation (writing in between):

This example fold using the accumulator function x

fold x [A, B, C, D]

thus equals

A x B x C x D

Now you just have to reason about the associativity of your operator (by putting parentheses!).

If you have a left-associative operator, you'll set the parentheses like this

((A x B) x C) x D

Here, you use a left fold. Example (haskell-style pseudocode)

foldl (-) [1, 2, 3] == (1 - 2) - 3 == 1 - 2 - 3 // - is left-associative

If your operator is right-associative (right fold), the parentheses would be set like this:

A x (B x (C x D))

Example: Cons-Operator

foldr (::) [] [1, 2, 3] == 1 :: (2 :: (3 :: [])) == 1 :: 2 :: 3 :: [] == [1, 2, 3]

In general, arithmetic operators (most operators) are left-associative, so foldl is more widespread. But in the other cases, infix notation + parentheses is quite useful.

Dario
Well, what you described are actually `foldl1` and `foldr1` in Haskell (`foldl` and `foldr` take an external initial value), and Haskell's "cons" is called `(:)` not `(::)`, but otherwise this is correct. You may want to add that Haskell additionally provides a `foldl'`/`foldl1'` which are strict variants of `foldl`/`foldl1`, because lazy arithmetic is not always desirable.
ephemient
Sorry, I thought I saw a "Haskell" tag on this question, but it's not there. My comment doesn't really make that much sense if it's not Haskell...
ephemient
+2  A: 

It's also worth noting (and I realise this is stating the obvious a bit), in the case of a commutative operator the two are pretty much equivalent. In this situation a foldl might be the better choice:

foldl: (((1 + 2) + 3) + 4) can calculate each operation and carry the accumulated value forward

foldr: (1 + (2 + (3 + 4))) needs a stack frame to be opened for 1 + ? and 2 + ? before calculating 3 + 4, then it needs to go back and do the calculation for each.

I'm not enough of an expert on functional languages or compiler optimisations to say whether this is will actually make a difference but it certainly seems cleaner to use a foldl with commutative operators.

The extra stack frames will definitely make a difference for large lists. If your stack frames exceed the size of the processor's cache, then your cache-misses are going to affect performance. Unless the list is doubly-linked, it is kind of hard to make foldr a tail-recursive function, so you should use foldl unless there is a reason not to.
A. Levy
The lazy nature of Haskell muddles this analysis. If the function being folded is non-strict in the second parameter, then `foldr` can very well be more efficient than `foldl`, and will not require any extra stack frames.
ephemient
Sorry, I thought I saw a "Haskell" tag on this question, but it's not there. My comment doesn't really make that much sense if it's not Haskell...
ephemient
+3  A: 

Olin Shivers differentiated them by saying "foldl is the fundamental list iterator" and "foldr is the fundamental list recursion operator." If you look at how foldl works:

((1 + 2) + 3) + 4

you can see the accumulator (as in a tail-recursive iteration) being built. In contrast, foldr proceeds:

1 + (2 + (3 + 4))

where you can see the traversal to the base case 4 and building up the result from there.

So I posit a rule of thumb: if it looks like a list iteration, one that would be simple to write in tail-recursive form, foldl is the way to go.

But really this will be probably be most evident from the associativity of the operators you are using. If they are left-associative, use foldl. If they are right-associative, use foldr.

Steven Huwig
+3  A: 

Other posters have given good answers and I won't repeat what they've already said. As you have given a Scala example in your question, I'll give a Scala specific example. As Tricks already said, a foldRight needs to preserve n-1 stack frames, where n is the length of your list and this can easily lead to a stack overflow - and not even tail recursion could save you from this.

A List(1,2,3).foldRight(0)(_ + _) would reduce to:

1 + List(2,3).foldRight(0)(_ + _)        // first stack frame
    2 + List(3).foldRight(0)(_ + _)      // second stack frame
        3 + 0                            // third stack frame 
// (I don't remember if the JVM allocates space 
// on the stack for the third frame as well)

while List(1,2,3).foldLeft(0)(_ + _) would reduce to:

(((0 + 1) + 2) + 3)

which can be iteratively computed, as done in the implementation of List.

In a strictly evaluated language as Scala, a foldRight can easily blow the stack up for large lists, while a foldLeft won't.

Example:

scala> List.range(1, 10000).foldLeft(0)(_ + _)
res1: Int = 49995000

scala> List.range(1, 10000).foldRight(0)(_ + _)
java.lang.StackOverflowError
        at scala.List.foldRight(List.scala:1081)
        at scala.List.foldRight(List.scala:1081)
        at scala.List.foldRight(List.scala:1081)
        at scala.List.foldRight(List.scala:1081)
        at scala.List.foldRight(List.scala:1081)
        at scala.List.foldRight(List.scala:1081)
        at scala.List.foldRight(List.scala:1081)
        at scala.List.foldRight(List.scala:1081)
        at scala.List.foldRig...

My rule of thumb is therefore - for operators that don't have a specific associativity, always use foldLeft, at least in Scala. Otherwise, go with other advice given in the answers ;).

-- Flaviu Cipcigan

Flaviu Cipcigan