I know there's a "not" on ienumerable thanks to linq which takes a collection to not against, but I'm worried about big oh performance What's the fastest algorithm to do this?
The only way to remove a subset of items from an IEnumerable<T>
is to loop through the superset and for each item in the superset loop through the subset, removing that item from the superset if it is found in the subset.
This will give you O(n²) on average.
Now if there is additional information about these collections (perhaps one or both is a list or maybe one or both of the sequences are sorted) that could help you create a more performant solution.
Nevermind, use the Enumerable.Except
extension method:
Produces the set difference of two sequences.
You may get better performance by converting the superset into a hashtable (O(n) typically, but then allows you to perform lookups in constant time). Then you can enumerate over the subset and check whether each item exists in the superset. The entire operation should take O(n) extra memory and O(n) time.
If you can iterate over the sets in order, you can guarantee O(n) behaviour (rather than the "typically O(n) but possibly O(n²) in the worst case" that a hashset has) by iterating through them both in lockstep.
For example:
//loop boilerplate
if(itemA < itemB) {
itemA = a.next();
continue;
}
if(itemA > itemB) {
itemB = b.next();
continue;
}
a.remove(itemA);
You'll need to add boundary checking and other boilerplate yourself.