views:

364

answers:

4

When I have two lists in OCaml, for example

e1 = [3; 4; 5; 6; 7]

and

e2 = [1; 3; 5; 7; 9]

Is there an efficient way to get the intersection of those two lists? I.e.:

[3; 5; 7]

Because I don't like scanning every element in list e2 for every element in list e1, thus creating a big Oh of order n^2.

+3  A: 

I don't know OCaml (syntax-wise), but generally you can do this in two ways:

  1. If your language has support for a Set-datastructure, then convert both lists into Sets and use the set-intersection operation.

  2. More generally: Sort both lists, then scan the sorted lists, which makes finding the duplicates much more efficient. You take n log(n) for sorting and can find the duplicates in linear time then.

Frank
OCaml do have set operation: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Set.S.htmlNote that bot solution are equivalent in term of complexity (with ocaml set).
Rémi
+5  A: 

As Franck and Rémi said, converting your lists to sets (from stdlib module Set) costs n log(n), and then Sets provides a linear implementation of intersection. Franck also mentioned the equivalent alternative to sort the lists, and then traverse them in a synchronized way. These are roughly the same (and by the way, in both cases you need to be able to provide a total ordering on the elements in your lists).

If intersections are an important part of your algorithm and you want them to be faster in the case of two sets of elements that are only slightly different, you need to switch to a mergeable structure such as Patricia trees. See files pt* in http://www.lri.fr/~filliatr/ftp/ocaml/ds/ .

If you need intersection to be fast in all cases, you have the possibility to use hash-consed Patricia trees. Hash-consing helps to recognize structurally identical sub-trees, and help to build efficient caches for previous operations by making comparison cheap.

Patricia trees can not use an arbitrary type as key (typically they are presented with ints as keys). But you can sometimes work around this limitation by numbering at creation each value you intend to use as a key.

Pascal Cuoq
+4  A: 

My OCaml isn't the best, but I hacked this function together which will intersect sorted lists:

let rec intersect l1 l2 =
    match l1 with [] -> []
        | h1::t1 -> (
          match l2 with [] -> []
              | h2::t2 when h1 < h2 -> intersect t1 l2
              | h2::t2 when h1 > h2 -> intersect l1 t2
              | h2::t2 -> (
                match intersect t1 t2 with [] -> [h1]
                    | h3::t3 as l when h3 = h1 -> l
                    | h3::t3 as l -> h1::l
              )
        );;

which should run in O(n+m) time. Basically it checks the first element of each list. If they are equal, it stores the result of the recursive call to their tails, and then checks to see if the head of the stored result is equal to the heads of the lists. If it isn't, it inserts it, otherwise it's a duplicate and it ignores it.

If they aren't equal, it just advances whichever one is smaller.

Niki Yoshiuchi
The function seems fine to me. I have the tiniest of remarks though. If you write `| h3::t3 as l -> h1::l` instead of `| h3::t3 -> h1::(h3::t3)`, you may save the compiler the allocation of a new cons cell to create a new list identical to one it already has. The compiler could do this optimization itself, but it probably won't.
Pascal Cuoq
Good call, I will edit my post and add that.
Niki Yoshiuchi
+2  A: 

As @Frank suggested, you can use sets to solve this problem, although it is not the best answer ever, but here is a short code listing demonstrating how this could be achieved in OCaml :

module Int_set = Set.Make (struct
                             type t = int
                             let compare = compare
                           end);;

(* iters through a list to construct a set*)
let set_of_list = List.fold_left (fun acc x -> Int_set.add x acc) Int_set.empty;;

let e1 = [3; 4; 5; 6; 7];;
let e2 = [1; 3; 5; 7; 9];;

let s1 = set_of_list e1;;
let s2 = set_of_list e2;;

(*result*)
let s3 = Int_set.inter s1 s2;;


(*testing output*)
Int_set.iter (fun elt -> print_int elt;print_string "\n") s3;;

The output is :

3
5
7
- : unit = ()
martani_net