views:

184

answers:

3

I understand that:

head (map (2**) [1..999999])

Will only actually evaluate 2**1, and none of the rest, but the book I am reading says that:

head (sort somelist)

Will only need to find the smallest item in the list, because that is all that is used. How does this work? As far as I can tell, this would be impossible with the sorting algorithms I know (like bubble sorting).

The only way I can think that this would work is if the sorting algorithm were to go through the entire list looking for the smallest item, and then recurse on the list without that item. To me, this sounds really slow.

Is this how the sort function works, or is there another sorting algorithm I don't know about, that would allow for short circuiting like it is?

+2  A: 

The algorithm you just described has a specific name: "selection sort". It's O(n2) so it's not quite the fastest thing you could do. However, if you want the first "k" elements in the sorted array, the complexity would be O(kn) which is nice if "k" is small enough (like your example).

Note that you are using a pure function in a functional language. The compiler is likely to be able to generate optimized code for sort in both cases by looking at the way functions are composed. It can easily infer that you want the minimum element when you compose head and sort.

Mehrdad Afshari
This last part is incorrect; compilers cannot infer intention!
Porges
Porges: While the compiler can be hardwired to analyze intention in specific cases, you do **not** need to infer *intention*. You need to mechanically use a theorem proved to prove that the optimized version of the code is mathematically equal to the original version. Functional languages make this theorem proving easier by disallowing side effects.
Mehrdad Afshari
Possibly, but I don't know of any Haskell compilers than include automated theorem provers as part of their optimization pass. The reason this composition of functions works is solely due to the default-lazy nature of Haskell.
Porges
+5  A: 

If you create a comparison function that traces its arguments, like this in GHCi's command line:

> :module + Data.List Debug.Trace
> let myCompare x y = trace ("\tCmp " ++ show x ++ " " ++ show y) $ compare x y

then you can see the behaviour yourself:

> sortBy myCompare "foobar"

"     Cmp 'f' 'o'
      Cmp 'o' 'b'
      Cmp 'f' 'b'
      Cmp 'a' 'r'
      Cmp 'b' 'a'
a     Cmp 'b' 'r'
b     Cmp 'f' 'o'
      Cmp 'f' 'r'
f     Cmp 'o' 'o'
      Cmp 'o' 'r'
o     Cmp 'o' 'r'
or"

Haskell is evaluating the string lazily, one character at a time. The left hand column is being printed as each character is found, with the right hand column recording the comparisons required, as printed by "trace".

Note that if you compile this, especially with optimisations on, you might get a different result. The optimiser runs a strictness analyser that will probably notice that the entire string gets printed, so it would be more efficient to evaluate it eagerly.

Then try

> head $ sortBy myCompare "foobar"

      Cmp 'f' 'o'
      Cmp 'o' 'b'
      Cmp 'f' 'b'
      Cmp 'a' 'r'
      Cmp 'b' 'a'
'a'

If you want to understand how this works, look up the source code for the sort function and evaluate 'sort "foobar"' manually on paper.

qsort [] = []
qsort (x:xs) = qsort less ++ [x] ++ qsort greater
   where (less, greater) = partition (< x) xs

So

   qsort ('f':"oobar")
 = qsort ('b':"a") ++ "f" ++ qsort ('o':"or")
 = ("a" ++ "b") ++ "f" ++ qsort ('o':"or")

And now we have done enough to find that 'a' is the first item in the result without having to evaluate the other call to "qsort". I've omitted the actual comparison because its hidden inside the call to "partition". Actually "partition" is also lazy, so in fact the argument to the other "qsort" hasn't been evaluated as far as I've shown it.

Paul Johnson
+8  A: 

This:

Will only need to find the smallest item in the list, because that is all that is used.

... should really say that the function only needs to do the minimal amount of work that the sorting algorithm requires to find the smallest element.

For example, if we are using quicksort as our underlying sorting algorithm, then head . quicksort is equivalent to the optimal (!) selection algorithm known as 'quickselect', which is worst-case linear. Moreover, we can implement k-quickselect merely by take k . quicksort.

Wikipedia notes in its article upon selection algorithms that (my emphasis):

Because language support for sorting is more ubiquitous, the simplistic approach of sorting followed by indexing is preferred in many environments despite its disadvantage in speed. Indeed, for lazy languages, this simplistic approach can even get you the best complexity possible for the k smallest/greatest sorted (with maximum/minimum as a special case) if your sort is lazy enough.

Quicksort works well in this scenario, whereas the default sort in Haskell (merge sort) doesn't compose quite as well, as it does more work than strictly necessary to return each element of the sorted list. As this post on the Haskell mailing list notes:

lazy quicksort is able to produce the batch of the first k smallest elements in

O(n + k log k) total time [1]

while lazy mergesort needs

O(n + k log n) total time [2]

For more you might like to read this blog post.

Porges