views:

108

answers:

3

Hi guys,

As part of a bigger problem of enumerating a set, I need to write an OCaml function 'choose' which takes a list and outputs as the list of all possible sequences of size k made up of elements of that list (without repeating sequences which can be obtained from each other by permutation). The order they are put in the end list is not relevant.

For example,

choose 2 [1;2;3;4] = [[1;2];[1;3];[1;4];[2;3];[2;4];[3;4]]

Any ideas?

I would like to have the whole thing to be lazy, outputting a lazy list, but if you have a strict solution, that'll be very useful too.

Thanks in advance, Surikator.

+5  A: 

Here is a strict and suboptimal version. I hope it is clear. It avoids duplicates by assuming there are no duplicates in the input list, and by generating only sublists that are in the same order as in the original list.

The length computation could be factored by passing l's length as an argument of choose. That would make the code less readable but more efficient.

For the lazy version, sprinkle "lazy" and "Lazy.force" on the code...

let rec choose k l =
  if k = 0 
  then [ [] ]
  else
    let len = List.length l in
    if len < k
    then []
    else if k = len
    then [ l ]
    else
      match l with
      h :: t ->
          let starting_with_h =
            (List.map (fun sublist -> h :: sublist) (choose (pred k) t))
          in
          let not_starting_with_h = choose k t in
          starting_with_h @ not_starting_with_h
      | [] -> assert false
;;
  val choose : int -> 'a list -> 'a list list = <fun>

# choose 3 [1; 2; 3; 4; 5; 6; 7] ;;                        
- : int list list =
[[1; 2; 3]; [1; 2; 4]; [1; 2; 5]; [1; 2; 6]; [1; 2; 7]; [1; 3; 4]; [1; 3; 5];
 [1; 3; 6]; [1; 3; 7]; [1; 4; 5]; [1; 4; 6]; [1; 4; 7]; [1; 5; 6]; [1; 5; 7];
 [1; 6; 7]; [2; 3; 4]; [2; 3; 5]; [2; 3; 6]; [2; 3; 7]; [2; 4; 5]; [2; 4; 6];
 [2; 4; 7]; [2; 5; 6]; [2; 5; 7]; [2; 6; 7]; [3; 4; 5]; [3; 4; 6]; [3; 4; 7];
 [3; 5; 6]; [3; 5; 7]; [3; 6; 7]; [4; 5; 6]; [4; 5; 7]; [4; 6; 7]; [5; 6; 7]]

EDIT:

A lazy_list_append as appears necessary from the comments below:

type 'a node_t =             
      | Empty
      | Node of 'a * 'a zlist_t
and 'a zlist_t = 'a node_t lazy_t

let rec lazy_list_append l1 l2 =
  lazy 
    (match Lazy.force l1 with
      Empty -> Lazy.force l2 
    | Node (h, lt) ->
    Node (h, lazy_list_append lt l2))
;;
Pascal Cuoq
That's great! Thanks a lot. Do you have any idea on how to get rid of the @ operator? My lists will have thousands of elements and I can't afford @'s inefficiency. Note that making it lazy is not a matter of sprinkling lazy and force because you use List.length. The lazy counterpart gives you inifinity...
Surikator
@Surikator The `@` removal comes for free with the laziness, if you do it right. That is, your lazy version will use `lazy_list_append` that is not inefficient in the way `@` is.
Pascal Cuoq
If you don't care about the order of the values that return, then can substitute it with `List.rev_append`; which is tail-recursive, and more efficient then `@`.
nlucaroni
@Surikator For the lazy version, do pass the list's length as an argument of `choose`. You can use `type extended_int = Finite of int | Infinite` if you wish to lazily start enumerating infinite possibilities, but my algorithm is not "fair": you will never see some sublists if the input is actually infinite. You will need to change it to generate all possible combinations with the elements already seen.
Pascal Cuoq
@Pascal Ignore my comment on List.length. It's clearly resolvable. I kind of see what you mean concerning @ becoming efficient in the lazy context but I miss the details. Do you have any pointer/suggestion concerning an efficient implementation of lazy_list_append? Thanks a lot, again!
Surikator
@Pascal You're saying your algorithm is not fair if the initial list to choose from is actually infinite, right? If that's the case, this is fine, as the lazy list I'll want to choose from will always be finite. So, no problems there =)
Surikator
@Pascal OK, I've found an efficient lazy_list_append. I think this will work:let rec append l1 l2 = match Lazy.force l1 with | Nil -> l2 | Cons (a, l) -> lazy (Cons (a, append l l2))
Surikator
@Pascal One last question. By using List.length in the lazy context, the list will have to be evaluated so that the length can be known. This makes the algorithm not to be fully lazy. Is there a way to get rid of this problem by passing the appropriate argument to choose (e.g. "len - 1" or something) which doesn't make use of the actual length of the list? Again, thanks a lot for all the very useful feedback.
Surikator
@Surikator It seems your version is better. I'm no specialist. Yours forces the first elements of `l1` before being forced itself, but that may be desirable... I am not sure.
Pascal Cuoq
@Surikator `List.length` is the tip of the problem iceberg. The list will need to be fully evaluated for the first `len` sublists anyway. If you want fully lazy, you need to change the order of emission. See my comment about fairness.
Pascal Cuoq
+2  A: 

Hi again,

Just for the sake of completeness, I am putting here the final code which brings together the strict code from Pascal with my lazy stuff and all other Pascal's useful comments.

The lazy list type is defined, then two auxiliary lazy functions (append and map), and finally the function "choose" that we aim to define.

type 'a node_t =
  | Nil                                            
  | Cons of 'a * 'a t
and 'a t = ('a node_t) Lazy.t

let rec append l1 l2 = 
match Lazy.force l1 with
    | Nil -> l2 
    | Cons (a, l) -> lazy (Cons (a, append l l2))

let rec map f ll = lazy (
match Lazy.force ll with
    | Nil ->    Nil
    | Cons(h,t) -> Cons(f h, map f t) )

let rec choose k l len =
  if k = 0 
  then lazy (Cons(lazy Nil,lazy Nil))
  else
        if len < k
        then lazy Nil
        else if k = len
    then lazy (Cons (l,lazy Nil))
    else
      match Lazy.force l with
          | Cons(h,t) ->  let g h sublist = lazy (Cons (h,sublist))
                          in let starting_with_h = (map (g h) (choose (k-1) t (len-1)))
                          in let not_starting_with_h = choose k t (len-1)
                          in append starting_with_h not_starting_with_h
          | Nil -> assert false

The result of evaluating "choose k ls n" is a lazy list of all choices of k elements of list ls, with ls considered up to size n. Note that, as pointed out by Pascal, because of the way the enumeration takes place, the function choose will not cover all choices of an infinite list.

Thanks, this was really useful!

Best, Surikator.

Surikator
+3  A: 

Plugging in again with a Haskell solution (it's just easier to work with lazy lists since they are built-in):

combinations 0 _ = [[]]
combinations k [] = []
combinations k (x:xs) = map (x:) (combinations (k-1) xs) ++ combinations k xs

The first two cases follow from the properties of binomial coefficients and more specifically: n choose 0 = 1 for all n including n=0 (that's why it is first to handle the case 0 choose 0). The other one is 0 choose k = 0. The third equation is exact translation of the recursive definition of combinations.

Unfortunately when you apply it to an infinite list it returns a trivial solution:

> take 10 $ combinations 3 [1..]
[[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,2,7],[1,2,8],[1,2,9],[1,2,10],[1,2,11],[1,2,12]]

EDIT: OK, so we really want to go trough each combination in a finite number of steps. With the above version we are obviously using only the expression to the left of ++ which generates only combinations starting with 1. We can work around this problem by defining an interesting list zipping function which builds a list by alternately picking the head of each of its argument lists (it's important to be non-strict in the second argument):

merge [] ys = ys
merge (x:xs) ys = x:merge ys xs

and use it instead of ++:

combinations k (x:xs) = map (x:) (combinations (k-1) xs) `merge` combinations k xs

lets see:

> let comb_10_3 = combinations 3 [1..10]
> let comb_inf_3 = combinations 3 [1..]
> take 10 comb_inf_3 
[[1,2,3],[2,3,4],[1,3,4],[3,4,5],[1,2,4],[2,4,5],[1,4,5],[4,5,6],[1,2,5],[2,3,5]]
> comb_10_3 `intersect` comb_inf_3 == comb_10_3 
True
> last $ combinations 3 [1..10]
[6,8,10]
> elemIndex [6,8,10] $ combinations 3 [1..]
Just 351

All 10 choose 3 combinations are there!

Daniel Velkov
I was more hoping to see an actual working lazy solution in Haskell, not the dysfunctional solution that had already been built and pointed out as flawed. Any ideas for that?
Pascal Cuoq
Actually, the lazy answer I proposed above in OCaml, based on Pascal strict stuff, although not working for infinite lists, works for lazy lists of arbitrary size (just not 'infinite size') =)
Surikator
@Pascal check the edit I made.
Daniel Velkov
@Daniel Velkov The PS: goes without saying (that is, you do not have to say it. We all know how elegant Haskell is. You can stop saying it now. Stop!). +1 despite the PS:
Pascal Cuoq
OK, no PS, lets have peace.
Daniel Velkov
@Daniel Thanks for that! Elegant idea there with the 'merge'. Definitely essential for dealing with infinite lists. @Pascal Yes, let's not go for Haskell/Ocaml discussions. We're all happy with our own choices, so let's keep it peaceful. =)
Surikator
@Daniel I think this question should keep the Lazy tag. My summary answer bringing everything together uses all the lazy arsenal. When I googled for an answer I was using the keyword lazy. And I'd be happy to find this page. Anyway, I'll leave it to you to decide.
Surikator