tags:

views:

490

answers:

3

Trying to understand the Scala quick sort example from Wikipedia. I wonder if anyone could disassemble the sample step by step and give explanation to all the syntactic sugar involved?

Apologies if the question makes me appear lazy, but I'm sure the effort will be appreciated by many other Scala newbies.

def qsort: List[Int] => List[Int] = {
  case Nil => Nil
  case pivot :: tail =>
    val (smaller, rest) = tail.partition(_ < pivot)
    qsort(smaller) ::: pivot :: qsort(rest)
}

As much as I can gather at this stage qsort is a function that takes no parameters and returns a new Function1[List[Int],List[Int]] that implements quick sort through usage of pattern matching, list manipulation and recursive calls. But I can't quite figure out where the pivot comes from, and how exactly the pattern matching syntax works in this case.

UPDATE:

Thanks everyone for the great explanations!

Just wanted to share another example of quick sort implementation which I have discovered in the Scala by Example by Martin Odersky. Although based around arrays instead of lists and less of a show-off in terms of varios Scala features I personally find it much less convoluted than its Wikipedia counterpart, and just so much more clear and to the point expression of the underlying algorithm:

def sort(xs: Array[Int]): Array[Int] = {
    if (xs.length <= 1) xs
    else {
          val pivot = xs(xs.length / 2)
          Array.concat(
                       sort(xs filter (pivot >)),
                       xs filter (pivot ==),
                       sort(xs filter (pivot <)))
         }
}
+4  A: 

The pivot in this pattern matching example is the first element of the list:

scala> List(1,2,3) match {
     |     case x :: xs => println(x)
     |     case _ => println("empty")
     | }
1

The pattern matching is based on extractors and the cons is not part of the language. It uses the infix syntax. You can also write

scala> List(1,2,3) match {
     |     case ::(x,xs) => println(x)
     |     case _ => println("empty")
     | }
1

as well. So there is a type :: that looks like the cons operator. This type defines how it is extracted:

final case class ::[B](private var hd: B, private[scala] var tl: List[B]){ ... }

It's a case class so the extractor will be generated by the Scala compiler. Like in this example class A.

case class A(x : Int, y : Int)

A(1,2) match { case x A y => printf("%s %s", x, y)}

-> 1 2

Based on this machinary patterns matching is supported for Lists, Regexp and XML.

Thomas Jung
+15  A: 
def qsort: List[Int] => List[Int] = {
  case Nil => Nil
  case pivot :: tail =>
    val (smaller, rest) = tail.partition(_ < pivot)
    qsort(smaller) ::: pivot :: qsort(rest)
}

let's pick apart a few bits.

Naming

Operators (such as * or +) are valid candidates for method and class names in Scala (hence you can have a class called :: (or a method called :: for that matter - and indeed both exist). Scala appears to have operator-overloading but in fact it does not: it's merely that you can declare a method with the same name.

Pattern Matching

target match {
  case p1 => 
  case p2 => 
}

Where p1 and p2 are patterns. There are many valid patterns (you can match against Strings, types, particular instances etc). You can also match against something called an extractor. An extractor basically extracts arguments for you in the case of a match, so:

target match {
  case MyExtractor(arg1, arg2, arg3) => //I can now use arg1, arg2 etc
}

In scala, if an extractor (of which a case class is an example) exists called X, then the pattern X(a, b) is equivalent to a X b. The case class :: has a constructor taking 2 arguments and putting this together we get that:

case x :: xs =>
case ::(x, xs) =>

Are equivalent. This match says "if my List is an instance of :: extract the value head into x and tail into xs". pattern-matching is also used in variable declaration. For example, if p is a pattern, this is valid:

val p = expression

This why we can declare variables like:

val x :: xs = List(1, 2, 3)
val (a, b) = xs.partition(_ % 2 == 0 ) //returns a Tuple2 which is a pattern (t1, t2)

Anonymous Functions

Secondly we have a function "literal". tail is an instance of List which has a method called partition which takes a predicate and returns two lists; one of those entries satisfying the predicate and one of those entries which did not.

val pred = (el: Int) => e < 2

Declares a function predicate which takes an Int and returns true iff the int value is less than 2. There is a shorthand for writing functions inline

tail.partition(_ < pivot) // _ is a placeholder for the parameter
tail.partition( (e: Int) => e < pivot )

These two expressions mean the same thing.

Lists

A List is a sealed abstract class with only two implementations, Nil (the empty list) and :: (also called cons), which is a non-empty list consisting of a head and a tail (which is also a list). You can now see that the pattern match is a match on whether the list is empty or not. a List can be created by cons-ing it to other lists:

val l = 1 :: 2 :: Nil
val m = List(1, 2, 3) ::: List(4, 5, 6)

The above lines are simply method calls (:: is a valid method name in scala). The only difference between these and normal method calls is that, if a method end in a colon : and is called with spaces, the order of target and parameter is reversed:

a :: b === b.::(a)

Function Types

val f: A => B 

the previous line types the reference f as a function which takes an A and returns a B, so I could then do:

val a = new A
val b: B = f(a)

Hence you can see that def qsort: List[Int] => List[Int] declares a method called qsort which returns a function taking a List[Int] and returning a List[Int]. So I could obviously do:

val l = List(2, 4, 1)
val m = qsort.apply(l) //apply is to Function what run is to Runnable
val n = qsort(l) //syntactic sugar - you don't have to define apply explicitly!

Recursion

When a method call is tail recursive, Scala will optimize this into the iterator pattern. There was a msitake in my original answer because the qsort above is not tail-recursive (the tail-call is the cons operator)

oxbow_lakes
+1 for the anonymous function being passed to partition! I missed that one. Let me edit my answer... :-)
Daniel
So... how does it feel explaining how Scala works at length? To me, it always reinforced the feeling that Scala is _nice_. :-)
Daniel
@Daniel - yes. I used to be *very* suspicious of the "look how cool quicksort is in my language" brigade, in the sense that I have never, ever been required to write the quicksort algorithm in my entire career. But it is relevant, I now think, assuming that the language also has libraries toi do the *heavy-lifting* of enterprise apps (ie. DB access, netwrorking etc)
oxbow_lakes
Oxbow_lakes, thanks, really appreciated! Yours and Daniel's are two excellent answers and I found it hard to select between the two. But since qsort is not really tail recursive and you mention the tail recursion I think it's only fair to go for Daniel's reply. I wish the two were merged into one.
Totophil
I was never too hot on tail recursion! I must protest though; Daniel has enough Scala points already
oxbow_lakes
Protest you may, but you still hold 4 more bronze medals than Daniel. :-)
Totophil
+10  A: 
def qsort: List[Int] => List[Int] = { 
  case Nil => Nil 
  case pivot :: tail => 
    val (smaller, rest) = tail.partition(_ < pivot) 
    qsort(smaller) ::: pivot :: qsort(rest) 
}

Let's rewrite that. First, replace the function literal with an instance of Function1:

def qsort: List[Int] => List[Int] = new Function1[List[Int], List[Int]] {
  def apply(input: List[Int]): List[Int] = input match {
    case Nil => Nil 
    case pivot :: tail => 
      val (smaller, rest) = tail.partition(_ < pivot) 
      qsort(smaller) ::: pivot :: qsort(rest) 
  }
}

Next, I'm going to replace the pattern match with equivalent if/else statements. Note that they are equivalent, not the same. The bytecode for pattern matches are more optimized. For instance, the second if and the exception throwing below do not exist, because the compile knows the second match will always happen if the first fails.

def qsort: List[Int] => List[Int] = new Function1[List[Int], List[Int]] {
  def apply(input: List[Int]): List[Int] = if (input == Nil) {
    Nil
  } else if (input.isInstanceOf[::[_]] && 
             scala.collection.immutable.::.unapply(input.asInstanceOf[::[Int]]) != None) {
    val unapplyResult = scala.collection.immutable.::.unapply(input.asInstanceOf[::[Int]]).get
    val pivot = unapplyResult._1
    val tail = unapplyResult._2
    val (smaller, rest) = tail.partition(_ < pivot) 
    qsort(smaller) ::: pivot :: qsort(rest) 
  } else {
    throw new scala.MatchError(input)
  }
}

Actually, val (smaller, rest) is pattern match as well, so Let's decompose it as well:

def qsort: List[Int] => List[Int] = new Function1[List[Int], List[Int]] {
  def apply(input: List[Int]): List[Int] = if (input == Nil) {
    Nil
  } else if (input.isInstanceOf[::[_]] && 
             scala.collection.immutable.::.unapply(input.asInstanceOf[::[Int]]) != None) {
    val unapplyResult0 = scala.collection.immutable.::.unapply(input.asInstanceOf[::[Int]]).get
    val pivot = unapplyResult0._1
    val tail = unapplyResult0._2
    val tmp0 = tail.partition(_ < pivot)
    if (Tuple2.unapply(tmp0) == None)
      throw new scala.MatchError(tmp0)
    val unapplyResult1 = Tuple2.unapply(tmp0).get
    val smaller = unapplyResult1._1
    val rest = unapplyResult1._2
    qsort(smaller) ::: pivot :: qsort(rest) 
  } else {
    throw new scala.MatchError(input)
  }
}

Obviously, this is highly unoptmized. Even worse, there are some function calls being done more than once, which doesn't happen in the original. Unfortunately, to fix that would require some structural changes to the code.

There's still some syntactic sugar here. There is an anonymous function being passed to partition, and there is the syntactic sugar for calling functions. Rewriting those yields the following:

def qsort: List[Int] => List[Int] = new Function1[List[Int], List[Int]] {
  def apply(input: List[Int]): List[Int] = if (input == Nil) {
    Nil
  } else if (input.isInstanceOf[::[_]] &&
             scala.collection.immutable.::.unapply(input.asInstanceOf[::[Int]]) != None) {
    val unapplyResult0 = scala.collection.immutable.::.unapply(input.asInstanceOf[::[Int]]).get
    val pivot = unapplyResult0._1
    val tail = unapplyResult0._2
    val func0 = new Function1[Int, Boolean] {
      def apply(input: Int): Boolean = input < pivot
    }
    val tmp0 = tail.partition(func0)
    if (Tuple2.unapply(tmp0) == None)
      throw new scala.MatchError(tmp0)
    val unapplyResult1 = Tuple2.unapply(tmp0).get
    val smaller = unapplyResult1._1
    val rest = unapplyResult1._2
    qsort.apply(smaller) ::: pivot :: qsort.apply(rest) 
  } else {
    throw new scala.MatchError(input)
  }
}

For once, the extensive explanations about each syntactic sugar and how it works are being done by others. :-) I hope this complements their answers. Just as a final note, the following two lines are equivalent:

    qsort(smaller) ::: pivot :: qsort(rest) 
    qsort(rest).::(pivot).:::(qsort(smaller))
Daniel
Daniel, thanks! "Accepted" by the skin of the teeth! I would have accepted both answers your's and Oxbow_lakes if I only could.
Totophil
@Totophil I think Oxbow's is the superior answer here, to be honest. Mine is just a complement.
Daniel
Daniel, agree both answers compliment each others and both really stand out for their pertinence, knowledge and effort spent. There are two things however that gave your reply a bit of advantage in my eyes: the very first expansion of the syntactic sugar showing the missing " = input match {" piece I was wondering about. And then there is that tiny but significant mistake in the Oxbow's answer to do with the tail recursion.
Totophil