views:

417

answers:

6

I have been asked at interview (C# 3.0) to provide a logic to remove a list of items from a list.

I responded

int[] items={1,2,3,4}; 
List<int> newList = new List<int>() { 1, 2, 3, 4, 5, 56, 788, 9 };
newList.RemoveAll((int i) => { return items.Contains(i); });

1) The interviewer replied that the algorithm i had employed will gradually take time if the items grow and asked me to give even better and faster one.What would be the efficient algorithm ?

2) How can i achieve the same using LINQ?

3) He asked me to provide an example for Two-Way-Closure? (General I am aware of closure, what is Two-Way-Closure?, I replied there is no such term exists,but he did not satisfy).

+10  A: 

EDIT Better solution: use Except which isn’t symmetrical, unlike Intersect.

1 & 2: you can use the Intersect extension method to do this. However, if your second array contains elements not found in the first one, these will then be in the resulting list: Intersect works symmetrically.

As for “two-way closure”, I’ve never heard of this term and I rather doubt that it’s an established technical term.

Konrad Rudolph
Thanks Konrad Rudolph :)
+3  A: 

If not using Except, and if you wanted your solution to scale to large lists, your best bet would be to sort the second list or to make a hash table out of it, so that for every element of the first list, you can easily identify it in the second. (That's how Except works, more or less.)

mquander
A: 

I don't believe the code you posted works. A standard array doesn't have a contains method.*

Disregarding that issue, maybe the issue he saw with your response is that the items.Contains method in your example is executed for each item that is in the list containing items to be removed. As a result, that list is searched for each item in the complete list of items.

As everyone else has mentioned, using the Except method would be more efficient.

EDIT: Didn't see you said C# 3.0. My fault. Still a syntax error between items1 and items.

Brian Hasden
IEnumerable<T> has a Contains extension method defined in C# 3.0+.
mquander
Brian, are you aware of extension methods? `Contains` is an extension methods of the `Enumerable` class which is applied to all `IEnumerable`s, hence also to arrays.
Konrad Rudolph
Boss that is extension method !!!!
+5  A: 

Your answer is O(N^2) because you have to search the small list for every element in the large list. You might have some luck using a hash table or a sorted list with binary search (in the case of integers/strings), and other sorts of ways to reduce the look-up overhead, which will at least get you to O(N log N).

As of note, if the size of the small list isn't similar to the size of the large list, your solution is O (N * M); you would want to optimize the typically larger list first. If you can sort the first list, that's a good choice; if you aren't allowed to modify it, don't forget to sort/hash the second list.

Broam
If the target list has a length n, and the list of items to remove has a length m, then the OP's solution is O(n*m), not O(n^2).
Juliet
Agreed; editing post.
Broam
+5  A: 

Example using Except

var exclusions = new List<int>() { 1, 2, 3, 4 };
var newList = new List<int>() { 1, 2, 3, 4, 5, 56, 788, 9 };
IEnumerable<int>result = newList.Except(exclusions);
Gord
A: 
int[] items1={1,2,3,4}; 
List<int> newList = new List<int>() { 1, 2, 3, 4, 5, 56, 788, 9 };
newList.RemoveAll((int i) => { return items.Contains(i); });

1) The interviewer replied that the algorithm i have employed will gradually take time if the item grows and asked me to give even better and faster one.What would be the efficient algorithm ?

Your items1 has a length m, and newlist has a length n. Since you're performing a linear lookup on items1 for each item in newlist, your solution is O(n*m). From my point of view, your implementation is perfectly acceptable, and we probably shouldn't worry about optimizations until we need to.

Evidently, however, its not good enough for the interview. Alright, what's "good enough" to the interviewer? Who knows. Now, if you don't mind taking some risks in your interview, you can challenge your interviewer to give more detail on the problem. Inform him that lots of factors and assumptions affect the speed of our code, especially the choice of data structure:

  • Assuming you require indexed access to newList, then converting items1 to a HashSet (which has O(1) lookup) improves your algorithm to the O(n) time required to rebuild the array.

  • Assuming newList is a "bag" of elements, meaning we don't care about the order of elements in the collection, then representing newList with a hashset allows us to remove m items in O(m) time.

  • Assuming we need to preserve the insertion order of elements in the collection, you can represent newList as a LinkedList{T}, while using a Dictionary{T, LinkedListNode} as a lookup table. When you add an item to the collection, you add it to the LinkedList, then add the newly created node to the dictionary. The LinkedList keeps the insertion order, and the Dictionary allows us to remove m items in O(m) time. However, we sacrifice indexed access to the collection.

  • Assuming newList keeps items in sorted order (doesn't appear to be the case with the sample code provided, but just humor me), most flavors of balanced binary trees will remove m items in O(m log n), they also support O(n) sorted-order traversal. Or if you're in right mood, a skip-list does the same job, only sillier.

3) He asked me to provide an example for Two-Way-Closure? (General I am aware of closure, what is meant for Two-Way-Closure?, I replied i have ever heard that term,but he did not satisfy).

I've never heard of it either, and Google doesn't appear to turn up any relevant articles (apart from this thread, none of the results on the first 5 pages are programming or even computer related). The interviewer is an idiot.

Juliet