views:

2225

answers:

11

When I am writing code in Python, I often need to remove items from a list or other sequence type based on some criteria. I haven't found a solution that is elegant and efficient, as removing items from a list you are currently iterating through is bad. For example, you can't do this:

for name in names:
    if name[-5:] == 'Smith':
        names.remove(name)

I usually end up doing something like this:

toremove = []
for name in names:
    if name[-5:] == 'Smith':
        toremove.append(name)
for name in toremove:
    names.remove(name)
del toremove

This is innefficient, fairly ugly and possibly buggy (how does it handle multiple 'John Smith' entries?). Does anyone have a more elegant solution, or at least a more efficient one?

How about one that works with dictionaries?

A: 

Well, this is clearly an issue with the data structure you are using. Use a hashtable for example. Some implementations support multiple entries per key, so one can either pop the newest element off, or remove all of them.

But this is, and what you're going to find the solution is, elegance through a different data structure, not algorithm. Maybe you can do better if it's sorted, or something, but iteration on a list is your only method here.

edit: one does realize he asked for 'efficiency'... all these suggested methods just iterate over the list, which is the same as what he suggested.

nlucaroni
+1  A: 
names = filter(lambda x: x[-5:] != "Smith", names);
pottedmeat
+1  A: 

filter would be awesome for this. Simple example:

names = ['mike', 'dave', 'jim']
filter(lambda x: x != 'mike', names)
['dave', 'jim']

Edit: Corey's list comprehension is awesome too.

mk
+6  A: 

Using a list comprehension

list = [x for x in list if x[-5:] != "smith"]
Corey
Does not really seem to be working for integers. temprevengelist = "0-12354-6876" temprevengelist = temprevengelist.split('-') list = [x for x in temprevengelist if x[-5:] != 6876]
Fahim Akhter
+11  A: 

Two easy ways to accomplish just the filtering are:

  1. Using "filter":

    names = filter(names, lambda name: name[-5:] != "Smith")

  2. Using list comprehensions:

    names = [name for name in names if name[-5:] != "Smith"]

Note that both cases keep the values for which the predicate function evaluates to True, so you have to reverse the logic (i.e. you say "keep the people who do not have the last name Smith" instead of "remove the people who have the last name smith").

Edit Funny... two people individually posted both of the answers I suggested as I was posting mine.

John
Also, generator expressions.
Aaron Maenpaa
`not name.endswith("Smith")` looks much nicer :-)
THC4k
sure, if you like readability or something.
John
+2  A: 

Both solutions, filter and comprehension requires building a new list. I don't know enough of the Python internals to be sure, but I think that a more traditional (but less elegant) approach could be more efficient:

names = ['Jones', 'Vai', 'Smith', 'Perez']

item = 0
while item <> len(names):
    name = names [item]
    if name=='Smith':
        names.remove(name)
    else:
        item += 1

print names

Anyway, for short lists, I stick with either of the two solutions proposed earlier.

PabloG
I think the names.remove(name) might be a O(n) operation, which would make this a O(n^2) algorithm.
postfuturist
I would personally write my while expression as item < len(names), just in case I screwed up the logic inside the loop. (even though it doesn't look like you did)
Miquella
It's probably more efficient to use del names[item] or names.pop(item) than names.remove(name). That's much less likely to be O(n), although I don't know the actual internals of how it works.
rjmunro
+1  A: 

The filter and list comprehensions are ok for your example, but they have a couple of problems:

  • They make a copy of your list and return the new one, and that will be inefficient when the original list is really big
  • They can be really cumbersome when the criteria to pick items (in your case, if name[-5:] == 'Smith') is more complicated, or has several conditions.

Your original solution is actually more efficient for very big lists, even if we can agree it's uglier. But if you worry that you can have multiple 'John Smith', it can be fixed by deleting based on position and not on value:

names = ['Jones', 'Vai', 'Smith', 'Perez', 'Smith']

toremove = []
for pos, name in enumerate(names):
    if name[-5:] == 'Smith':
        toremove.append(pos)
for pos in sorted(toremove, reverse=True):
    del(names[pos])

print names

We can't pick a solution without considering the size of the list, but for big lists I would prefer your 2-pass solution instead of the filter or lists comprehensions

Ricardo Reyes
This doesn't work properly if you have more than one 'Smith' entry, because the additional instances to remove have been shifted due to the removal of earlier instances.And for a similar reason, this algorithm causes an exception if a second 'Smith' entry is added to the end of the list.
Miquella
@Miquella: you are right, my original post failed for multiple Smiths, I fixed it doing the delete in reverse order. Thanks.
Ricardo Reyes
+3  A: 

There are times when filtering (either using filter or a list comprehension) doesn't work. This happens when some other object is holding a reference to the list you're modifying and you need to modify the list in place.

for name in names[:]:
    if name[-5:] == 'Smith':
        names.remove(name)

The only difference from the original code is the use of names[:] instead of names in the for loop. That way the code iterates over a (shallow) copy of the list and the removals work as expected. Since the list copying is shallow, it's fairly quick.

gooli
+2  A: 

To answer your question about working with dictionaries, you should note that Python 3.0 will include dict comprehensions:

>>> {i : chr(65+i) for i in range(4)}

In the mean time, you can do a quasi-dict comprehension this way:

>>> dict([(i, chr(65+i)) for i in range(4)])

Or as a more direct answer:

dict([(key, name) for key, name in some_dictionary.iteritems if name[-5:] != 'Smith'])
Jason Baker
+10  A: 

You can also iterate backwards over the list:

for name in reversed(names):
    if name[-5:] == 'Smith':
        names.remove(name)

This has the advantage that it does not create a new list (like filter or a list comprehension) and uses an iterator instead of a list copy (like [:]).

Note that although removing elements while iterating backwards is safe, inserting them is somewhat trickier.

Xavier Martinez-Hidalgo
Solved my problem, thanks :)
Ashy
This is a really innovative and Pythonic solution. I love it!
Richo
oo pretty clever
Claudiu
+1  A: 

In the case of a set.

toRemove = set([])  
for item in mySet:  
    if item is unwelcome:  
        toRemove.add(item)  
mySets = mySet - toRemove
CashMonkey