views:

95

answers:

3

Trying to find an efficient way to extract all instances of items in an array out of another.

For example

array1 = ["abc", "def", "ghi", "jkl"]

array2 = ["abc", "ghi", "456", "789"]

Array 1 is an array of items that need to be extracted out of array 2. Thus, array 2 should be modified to ["456", "789"]

I know how to do this, but no in an efficient manner.

+3  A: 

If your lists can't contain duplicates and you don't care about the order then you should be using sets instead of lists (by the way, they are called lists, not arrays). Then what you want is both fast and trivial to implement:

>>> set1 = set(["abc", "def", "ghi", "jkl"])
>>> set2 = set(["abc", "ghi", "456", "789"])
>>> set2 - set1
set(['456', '789'])

If list2 can contain duplicates or the order matters then you can still make list1 a set to speed up the lookups:

>>> list1 = ["abc", "def", "ghi", "jkl"]
>>> list2 = ["abc", "ghi", "456", "789"]
>>> set1 = set(list1)
>>> [a for a in list2 if a not in set1]
['456', '789']

Note that this requires that the items are hashable but runs in close to O(n) time.

If the items are not hashable but they are orderable then you could sort list1 and use a binary search to find items in it. This gives O(n log(n)) time.

If your items are neither hashable not orderable then you will need to resort to the slow O(n*n) simple linear search for each element.

Mark Byers
And if you don't care about order.
Thomas Wouters
No..don't want the difference between the two. While it's not the exact scenario...thing of array 1 as a list of bad words, array two as a list of searchwords. I want to get a result of the searchwords with all of the bad words from a master list in array 1 removed.
Scott
@Thomas: Thanks, I've included that point in my answer.
Mark Byers
re: binary search: the bisect module (http://docs.python.org/library/bisect.html ) is good for that.
Devin Jeanpierre
@Scott: No, I just got list1 and list2 the wrong way round - it is fixed now. By the way, badwords should definitely be a set - the order doesn't matter and there is no reason to have duplicates.
Mark Byers
@Scott, It sounds like at least `list2` should be a set all along, possibly `list1` too.
Mike Graham
+6  A: 

These are lists, not arrays. (The word "array" means different things to different people, but in python the objects call themselves lists, and that's that; there are other modules that provide objects that call themselves arrays, such as array and numpy)

To answer your question, the easiest way is to not modify array2 at all. Use a list comprehension:

set1 = set(array1)
array2 = [e for e in array2 if e not in set1]

(the set makes this O(n) instead of O(n^2))

If you absolutely must mutate array2 (because it exists elsewhere), you can use slice assignment:

array2[:] =  [e for e in array2 if e not in set1]

It's just as efficient, but kind of nasty.

edit: as Mark Byers points out, this only works if array1 only contains hashable elements (such as strings, numbers, etc.).

Devin Jeanpierre
If you don't care about duplicates or order you should consider doing `set(array2) - set(array2)`.
Jules
A: 

The straightforward way would be something like;

array2 = [i for i in array2 if i not in array1]

list comprehensions is what you need here

Nick