tags:

views:

158

answers:

2

I was reading about fold techniques in Programming in Scala book and came across this snippet:

def reverseLeft[T](xs:List[T]) = (List[T]() /: xs) {
    (y,ys) => ys :: y
}

As you can see, it was done using foldLeft or /: operator. Curious how it would look like if I did it using :\, I came up with this:

def reverseRight[T](xs:List[T]) = (xs :\ List[T]()) {
    (y,ys) => ys ::: List(y)
}

As I understand it, ::: doesn't seem to be as fast as :: and has a linear cost depending on the size of the operand list. Admittedly, I don't have a background in CS and no prior FP experience. So my questions are:

  • How do you recognise/distinguish between foldLeft/foldRight in problem approaches?
  • Is there a better way of doing this without using :::?
A: 

Operations on a List are intentionally not symmetric. The List data structure is a singly-linked list where each node (both data and pointer) are immutable. The idea behind this data structure is that you perform modifications on the front of the list by taking references to internal nodes and adding new nodes that point to them -- different versions of the list will share the same nodes for the end of the list.

The ::: operator which appends a new element on to the end of the list has to create a new copy of the entire list, because otherwise it would modify other lists that share nodes with the list you're appending to. This is why ::: takes linear time. Scala has a data structure called a ListBuffer that you can use instead of the ::: operator to make appending to the end of a list faster. Basically, you create a new ListBuffer and it starts with an empty list. The ListBuffer maintains a list completely separate from any other list that the program knows about, so it's safe to modify it by adding on to the end. When you're finished adding on to the end, you call ListBuffer.toList, which releases the list into the world, at which point you can no longer add on to the end without copying it.

foldLeft and foldRight also share a similar assymmetry. foldRight requires you to walk the entire list to get to the end of the list, and keep track of everywhere you've visited on the way there, so that you an visit them in reverse order. This is usually done recursively, and it can lead to foldRight causing stack overflows on large lists. foldLeft on the other hand, deals with nodes in the order they appear in the list, so it can forget the ones it's visited already and only needs to know about one node at a time. Though foldLeft is also usually implemented recursively, it can take advantage of an optimization called tail recursion elimination, in which the compiler transforms the recursive calls into a loop because the function doesn't do anything after returning from the recursive call. Thus, foldLeft doesn't overflow the stack even on very long lists. EDIT: foldRight in Scala 2.8 is actually implemented by reversing the list and running foldLeft on the reversed list -- so the tail recursion issue is not an issue -- both data structures optimize tail recursion correctly, and you could choose either one (You do get into the issue now that you're defining reverse in terms of reverse -- you don't need to worry if you're defining your own reverse method for the fun of it, but you wouldn't have the foldRight option at all if you were defining Scala's reverse method.)

Thus, you should prefer foldLeft and :: over foldRight and :::.

(In an algorithm that would combine foldLeft with ::: or foldRight with ::, then you need to make a decision for yourself about which is more important: stack space or running time. Or you should use foldLeft with a ListBuffer.)

Ken Bloom
This is a totally false dichotomy in a strict implementation of foldRight. To fold a list associating to the right using linear recursion is actually *slower* than reversing the list and folding left. As implemented in the standard library, there's no reason to ever use foldRight on a List.
Apocalisp
These issues were addressed in detail in a long thread on the Scala-Debate mailing list a couple of months ago: http://bit.ly/cwWzkP
Randall Schulz
I just checked the source code: `foldRight=reversed.foldLeft`, so I'll update my answer.
Ken Bloom
@Ken, yes, I was aware of the mutable `ListBuffer` class. I was actually looking for a better FP approach than I used. Thanks.
SRI
@Randall, thanks for the link. Now I know I won't use foldRight when I have foldLeft. :)
SRI
@SRI, most LISP implementations have an automatic optimization called "tail recursion modulo cons", which you would implement manually in Scala using the `ListBuffer` (so I wouldn't call this an entirely un-functional way to program.)
Ken Bloom
+6  A: 

Since foldRight on List in the standard library is strict and implemented using linear recursion, you should avoid using it, as a rule. An iterative implementation of foldRight would be as follows:

def foldRight[A,B](f: (A, B) => B, z: B, xs: List[A]) =
  xs.reverse.foldLeft(z)((x, y) => f(y, x))

A recursive implementation of foldLeft could be this:

def foldLeft[A,B](f: (B, A) => B, z: B, xs: List[A]) =
  xs.reverse.foldRight(z)((x, y) => f(y, x))

So you see, if both are strict, then one or the other of foldRight and foldLeft is going to be implemented (conceptually anyway) with reverse. Since the way lists are constructed with :: associates to the right, the straightforward iterative fold is going to be foldLeft, and foldRight is simply "reverse then foldLeft".

Intuitively, you might think that this would be a slow implementation of foldRight, since it folds the list twice. But:

  1. "Twice" is a constant factor anyway, so it's asymptotically equivalent to folding once.
  2. You have to go over the list twice anyway. Once to push computations onto the stack and again to pop them off the stack.
  3. The implementation of foldRight above is faster than the one in the standard library.
Apocalisp
The implmenntation of `foldRight` above **is** the one in the Standard Library for Scala 2.8. See https://lampsvn.epfl.ch/trac/scala/browser/scala/trunk/src///library/scala/collection/TraversableOnce.scala#L194
Ken Bloom
No it isn't. https://lampsvn.epfl.ch/trac/scala/browser/scala/trunk/src/library/scala/collection/LinearSeqOptimized.scala#L129
Apocalisp
Thanks a lot! But I have one (slightly tangential) question - How do you say this implementation is faster than the one in standard library? I have tried doing something like this in the past, but have been burnt by badly constructed tests and JVM warmup issues. Link, resources would be helpful. That said, I'm inclined to avoiding foldRight in favour of foldLeft.
SRI