views:

141

answers:

3

I know I can remove the last element from a set:

s.Remove(s.MaximumElement)

But if I want to remove the n maximum elements... do I just execute the above n times, or is there a faster way to do that?

To be clear, this is an obvious solution:

let rec removeLastN (s : Set<'a>, num : int) : Set<'a> = 
    match num with
    | 0 -> s
    | _ -> removeLast(s.Remove(s.MinimumElement), num-1)

But it involves creating a new set n times. Is there a way to do it and only create a new set once?

A: 

That is already a pretty good solution. OCaml has a split function that can split a Set so you can find the right element then you can split the Set to remove a bunch of elements at a time. Alternatively, you can use Set.difference to extract another Set of elements.

Jon Harrop
I can't find Set.split in the F# documentation, can you post a linky?
Juliet
Looks like they removed it from F#. :-(
Jon Harrop
A: 

In F#, you can use Set.partition or Set.filter to create sub sets:

let s = Set([1;4;6;9;100;77])

let a, b = Set.partition (fun x -> x <= 10) s

let smallThan10 = Set.filter (fun x -> x < 10) s

In your question, maybe you don't know the value of the ith number of your set, so here is a handy function for that:

let nth (n:int) (s:'a Set) = 
    s |> Set.toSeq |> Seq.nth n

Now, we can write the remove-top-n function:

let removeTopN n (s:'a Set) = 
    let size = s.Count
    let m = size - n
    let mvalue = nth m s
    Set.filter (fun x -> x < mvalue) s

and test it:

removeTopN 3 s

and we get:

val it : Set<int> = set [1; 4; 6]

Notice that the removeTopN does not work for a set containing multiple same values.

Yin Zhu
Will have to recommend against `partition` and `filter` here, they both evaluate every item in the collection and take `O(n lg n)` time to rebuild a brand new set. In principle, if a user knows what node they're looking for, they should be able to split a tree into left and right halfs in O(lg n) time, but I don't think F# Set supports that functionality out of the box.
Juliet
+1  A: 

But it involves creating a new set n times. Is there a way to do it and only create a new set once?

To the best of my knowledge, no. I'd say what you have a perfectly fine implementation, it runs in O(lg n) -- and its concise too :) Most heap implementations give you O(lg n) for delete min anyway, so what you have is about as good as you can get it.

You might be able to get a little better speed by rolling your balanced tree, and implementing a function to drop a left or right branch for all values greater than a certain value. I don't think an AVL tree or RB tree are appropriate in this context, since you can't really maintain their invariants, but a randomlized tree will give you the results you want.

A treap works awesome for this, because it uses randomization rather than tree invariants to keep itself relatively balanced. Unlike an AVL tree or a RB-tree, you can split a treap on a node without worrying about it being unbalanced. Here's a treap implementation I wrote a few months ago:

http://pastebin.com/j0aV3DJQ

I've added a split function, which will allows you take a tree and return two trees containing all values less than and all values greater than a given value. split runs in O(lg n) using a single pass through the tree, so you can prune entire branches of your tree in one shot -- provided that you know which value to split on.

But if I want to remove the n maximum elements... do I just execute the above n times, or is there a faster way to do that?

Using my Treap class:

open Treap

let nthLargest n t = Seq.nth n (Treap.toSeqBack t)
let removeTopN n t =
    let largest = nthLargest n t
    let smallerValues, wasFound, largerValues = t.Split(largest)
    smallerValues

let e = Treap.empty(fun (x : int) (y : int) -> x.CompareTo(y))
let t = [1 .. 100] |> Seq.fold (fun (acc : Treap<_>) x -> acc.Insert(x)) e
let t' = removeTopN 10 t

removeTopN runs in O(n + lg m) time, where n is the index into the tree sequence and m is the number of items in the tree.

I make no guarantees about the accuracy of my code, use at your own peril ;)

Juliet