views:

346

answers:

2

Hello world!

I’m trying to implement a tail-recursive list-sorting function in OCaml, and I’ve come up with the following code:

let tailrec_merge_sort l =
  let split l = 
    let rec _split source left right =
      match source with
        | [] -> (left, right)
        | head :: tail -> _split tail right (head :: left) 
    in _split l [] []
  in

  let merge l1 l2 = 
    let rec _merge l1 l2 result =
      match l1, l2 with
        | [], [] -> result
        | [], h :: t | h :: t, [] -> _merge [] t (h :: result)
        | h1 :: t1, h2 :: t2 ->
            if h1 < h2 then _merge t1 l2 (h1 :: result)
            else            _merge l1 t2 (h2 :: result)
    in List.rev (_merge l1 l2 [])
  in

  let rec sort = function
    | [] -> []
    | [a] -> [a]
    | list -> let left, right = split list in merge (sort left) (sort right)
  in sort l
;;

Yet it seems that it is not actually tail-recursive, since I encounter a "Stack overflow during evaluation (looping recursion?)" error.

Could you please help me spot the non tail-recursive call in this code? I've searched quite a lot, without finding it. Cout it be the let binding in the sort function?

Thanks a lot, CFP.

+5  A: 

The expression

merge (sort left) (sort right)

is not tail-recursive; it calls both (sort left) and (sort right) recursively while there is remaining work in the call (merge).

I think you can fix it with an extra continuation parameter:

  let rec sort l k =
    match l with
    | [] -> k [] 
    | [a] -> k [a] 
    | list -> let left, right = split list in sort left (fun leftR -> sort right (fun rightR -> k (merge leftR rightR)))
  in sort l (fun x -> x)
Brian
Oh, I think I understand; thanks! But then, how can I make my function recursive?
CFP
Could you explain why continuations do actually make the function tail-recursive? Or do they just move the process of capturing the stack-frame from the (potentially overflowing) stack to the generated closure?
Dario
Hmmm, I guess this should work, but I don't really understand what the k function do. Could you please explain it a little? Thanks a lot!I've tested it though, but it doesn't solve the overflow problem... Any idea why?
CFP
I didn't test the code, so I may have made a mistake. :) The best explanation of the continuation strategy I have is here: http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!170.entry
Brian
It's a bug with the tail-rec optimization in Caml I guess. Anyway the doc. is excellent, thanks a lot!
CFP
+5  A: 

Merge sort is inherently not tail-recursive. A sort requires two recursive calls, and in any execution of any function, at most one dynamic call can be in tail position. (split is also called from non-tail position, but since it should use constant stack space that should be OK).

Be sure you are using a recent version of OCaml. In versions 3.08 and older, List.rev could blow the stack. This problem is fixed in version 3.10. Using version 3.10.2, I can sort a list of ten million elements using your code. It takes a couple of minutes, but I don't blow the stack. So I'm hoping your problem is simply that you have an old version of OCaml.

If not, a good next step would be to set the environment variable

OCAMLRUNPARAM=b=1

which will give a stack trace when you blow the stack.

I'd also like to know the length of the arrays you are sorting; although merge sort cannot be tail-recursive, its non-tail nature should cost you logarithmic stack space.

If you need to sort more than 10 million elements, maybe you should be looking at a different algorithm? Or if you want, you could CPS-convert mergesort by hand, which will yield a tail-recursive version at the cost of allocating contiuations on the heap. But my guess is that it won't be necessary.

Norman Ramsey
Hmmm, since split is not in last position, does it count? (I mean, as I understand it, the compiler should be able to detect a tail-recursive function and convert it into a loop ; then, only the last call would matter) Furthermore, using continuations should make the function tail-recursive, shouldn't it?
CFP
I'm using OCaml v11.0, and I blow the stack when running my code on 10^6 elements. I need to sort between 5 and 10 million elements.
CFP
Finally, my problem is that even using continuations I do blow the stack. Any idea why?
CFP
By the way, is there a way to get a backtrace when running toplevel programs?
CFP
@CFP: Setting the environment variable should get you a backtrace. If not, just compile your test case and run it as a standalone binary.
Norman Ramsey
It doesn't, but compiling works =)
CFP