tags:

views:

82

answers:

3

I know there's a "not" on ienumerable thanks to linq which takes a collection to not against, but I'm worried about big oh performance What's the fastest algorithm to do this?

+1  A: 

The only way to remove a subset of items from an IEnumerable<T> is to loop through the superset and for each item in the superset loop through the subset, removing that item from the superset if it is found in the subset.

This will give you O(n²) on average.

Now if there is additional information about these collections (perhaps one or both is a list or maybe one or both of the sequences are sorted) that could help you create a more performant solution.

If you are interested, here is an extension method that will do what I just described: public static IEnumerable Exclude (this IEnumerable source, IEnumerable items) { foreach (T t in source) if (!items.Contains(t)) yield return t; }


Nevermind, use the Enumerable.Except extension method:

Produces the set difference of two sequences.

Andrew Hare
Okay, let's assume I can sort on a particular property of the contained objects; how can I use that?
Pierreten
If you're using a hashset (which is typically O(1) to check containment or add/remove), this is O(n), where n is the size of the superset. If the subset is smaller, you're better off iterating over that and removing each item in it from the superset.
Anon.
Unless you are trying to remove duplicates, in which case if the hash key is picked correctly this will be an O(1) problem since the hash key will take care of removing the duplicates initially. Loading keys will still be O(n) though.
GrayWizardx
A: 

You may get better performance by converting the superset into a hashtable (O(n) typically, but then allows you to perform lookups in constant time). Then you can enumerate over the subset and check whether each item exists in the superset. The entire operation should take O(n) extra memory and O(n) time.

Dathan
+1  A: 

If you can iterate over the sets in order, you can guarantee O(n) behaviour (rather than the "typically O(n) but possibly O(n²) in the worst case" that a hashset has) by iterating through them both in lockstep.

For example:

//loop boilerplate
if(itemA < itemB) {
    itemA = a.next();
    continue;
}
if(itemA > itemB) {
    itemB = b.next();
    continue;
}
a.remove(itemA);

You'll need to add boundary checking and other boilerplate yourself.

Anon.