views:

633

answers:

1

This question is about the way that Scala does pattern matching and recursion with lists and the performance thereof.

If I have a function that recurses over a list and I do it with matching on a cons, for example something like:

def myFunction(xs) = xs match { 
  case Nil => Nil
  case x :: xs => «something» myFunction(xs)
}

in Haskell:

myFunction [] = []
myFunction (x:xs) = «something» : myFunction xs

I'm using the same semantics as I would do with, for example, Haskell. I don't think there would be any question about the Haskell implementation as that's simply the way that you deal with lists. For a long list (I would be operating on a list with a few thousand nodes), Haskell wouldn't blink (I imagine though I've never tried).

But from what I understand with Scala, the match statement would call the unapply extractor method to split the list around the cons and, to extend the example to a function that does nothing to the list:

def myFunction(xs) = xs match { 
  case Nil => Nil
  case x :: xs => x :: myFunction(xs)
}

in Haskell:

myFunction [] = []
myFunction (x:xs) = x : myFunction xs

it would call apply extractor method to cons it back together. For a long list I imagine this would be very expensive.

To illustrate, in my specific case I want to recurse over a list of characters and accumulate various things, where the input string is anything up to a few tens of kilobytes.

Will I really be calling constructors and extractors for each step of the recursion if I want to recurse over a long list? Or are there optimisations? Or better ways to do it? In this case I would need several accumulator variables and obviously I wouldn't just be recursing over a list doing nothing...

(Please excuse my Haskell, I've not written a line for two years)

(and yes, I'm going for tail recursion)

Thanks in advance

Joe

+8  A: 

First, Haskell is non-strict, so these function calls on the tail might never be evaluated at all. Scala, on the other hand, will compute all of the list before returning. A closer implementation to what happens in Haskell would be this:

def myFunction[T](l: List[T]): Stream[T] = l match {   
  case Nil => Stream.empty  
  case x :: xs => x #:: myFunction(xs)
}

That receives a List, which is strict, and returns a Stream which is non-strict.

Now, if you want to avoid pattern matching and extractors (though none is called in this particular case -- see below), you might just do this:

def myFunction[T](xs: List[T]): Stream[T] =
  if (l.isEmpty) Stream.empty else l.head #:: myFunction(l.tail)

I just realized you intend to go for tail recursion. What you wrote is not tail-recursive, because you prepend x to the result of the recursion. When handling lists, you'll get tail recursion if you compute the results backwards and then invert:

def myFunction[T](xs: List[T]): List[T] = {
  def recursion(input: List[T], output: List[T]): List[T] = input match {
    case x :: xs => recursion(xs, x :: output)
    case Nil => output
  }
  recursion(xs, Nil).reverse
}

Last, let's decompile an example to see how it works:

class ListExample {
  def test(o: Any): Any = o match {
    case Nil => Nil
    case x :: xs => xs
    case _ => null
  }
}

Generates:

public class ListExample extends java.lang.Object implements scala.ScalaObject{
public ListExample();
  Code:
   0:   aload_0
   1:   invokespecial   #10; //Method java/lang/Object."<init>":()V
   4:   return

public java.lang.Object test(java.lang.Object);
  Code:
   0:   aload_1
   1:   astore_2
   2:   getstatic       #18; //Field scala/Nil$.MODULE$:Lscala/Nil$;
   5:   aload_2
   6:   invokestatic    #24; //Method scala/runtime/BoxesRunTime.equals:(Ljava/lang/Object;Ljava/lang/Object;)Z
   9:   ifeq    18
   12:  getstatic       #18; //Field scala/Nil$.MODULE$:Lscala/Nil$;
   15:  goto    38
   18:  aload_2
   19:  instanceof      #26; //class scala/$colon$colon
   22:  ifeq    35
   25:  aload_2
   26:  checkcast       #26; //class scala/$colon$colon
   29:  invokevirtual   #30; //Method scala/$colon$colon.tl$1:()Lscala/List;
   32:  goto    38
   35:  aconst_null
   36:  pop
   37:  aconst_null
   38:  areturn

public int $tag()   throws java.rmi.RemoteException;
  Code:
   0:   aload_0
   1:   invokestatic    #42; //Method scala/ScalaObject$class.$tag:(Lscala/ScalaObject;)I
   4:   ireturn

}

Decoding, it first calls the method equals on the passed parameter and the object Nil. Returns the latter if true. Otherwise, it calls instanceOf[::] on the object. If true, it casts the object to that, and invokes the method tl on it. Failing all that, loads the cosntant null and returns it.

So, you see, x :: xs is not calling any extractor.

As for accumulating, there's another pattern you might want to consider:

val list = List.fill(100)(scala.util.Random.nextInt)
case class Accumulator(negative: Int = 0, zero: Int = 0, positive: Int = 0)
val accumulator = list.foldLeft(Accumulator())( (acc, n) => 
  n match {
    case neg if neg < 0 => acc.copy(negative = acc.negative + 1)
    case zero if zero == 0 => acc.copy(zero = acc.zero + 1)
    case pos if pos > 0 => acc.copy(positive = acc.positive + 1)
  })

The default parameters and copy methods are a Scala 2.8 feature I used just to make the example simpler, but the point is using the foldLeft method when you want to accumulate things over a list.

Daniel
Thank you! Yes, what I am writing will be tail-recursive, the second snippet was a bad example! What I'm really aiming to do is partitioning the input string into various chunks based on some non-trivial function: the result is the accumulators not any map or fold style output.Perhaps I was trying to make the question too general in the search for a general pattern that I can use in this specific case.
Joe
Well, take a look at the last example I added.
Daniel
Perfect, that answers both facets. Thank you very much!
Joe