views:

235

answers:

7

I lost a little bit of time in this Python for statement:

class MyListContainer:
    def __init__(self):
        self.list = []

    def purge(self):
        for object in self.list:
            if (object.my_cond()):
                self.list.remove(object)
        return self.list

container = MyListContainer()

# now suppose both obj.my_cond() return True
obj1 = MyCustomObject(par)
obj2 = MyCustomObject(other_par)

container.list = [obj1, obj2]

# returning not an empty list but [obj2]
container.purge()

It doesn't work as I expected because when the cycle in "purge" delete the first object in list the second one is shifted to the beginning of the list and the cycle is ended.

I solved duplicating self.list before the for cycle:

...
local_list = self.list[:]
for object in local_list:
...

I suppose that the for statement stop working because I'm changing the length of the original list. Can someone clarify this point ?

And is there a more "elegant" way to solve this problem ? If I have more than few elements inside the list, duplicating it every time does not seem a good idea.

Maybe the filter() function is the right one but i whish to have some other approach if any.

I'm a newbie.


To summarize your useful answers:

  • Never edit a list you are looping
  • Duplicate the list or use list comprehensions
  • Duplicating a list could not waste your memory or in this case who's mind about it
+3  A: 

Don't try. Just don't. Make a copy or generate a new list.

Ignacio Vazquez-Abrams
Ok I promise you. But can you explain why ?
yuri
Modifying a list you're operating on disrupts the internal structures required by Python to keep track of where it is in the iteration.
Ignacio Vazquez-Abrams
+3  A: 

Just make yourself a new list:

def purge(self):
    self.list = [object for object in self.list if not object.my_cond()]
    return self.list

Reserve any optimization until you've profiled and found that this method really is the bottleneck of your application. (I bet it won't be.)

Jon-Eric
`if not object.my_cond()`
SilentGhost
@SilentGhost, Thanks! I wasn't paying full attention.
Jon-Eric
A: 

Your second solution in which you duplicate the list is the right way to go. Afterward you can just replace the old list with the duplicate if need be.

jathanism
+2  A: 

In python variables are actually labels to data. Duplicating a list is, for the most part, making a new set of pointers to the data from the first list. Don't feel too bad about it.

List comprehensions are your friend.

e.g.

>>> a = range(20)
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> [ x for x in a if x % 2 == 0 ]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
MattH
+1  A: 

Filter (or list comprehension) IS the way to go. If you want to do it inplace, something like this would work:

purge = []
for i,object in enumerate(self.list):
    if object.mycond()
        purge.append(i)
for i in reversed(purge):
    del self.list[i]

Or alternatively, the purge list can be made with a comprehension, a shortcut version looks like:

for i in reversed([ i for (i,o) in enumerate(self.list) if o.mycond() ]):
    del self.list[i]
ondra
I wonder why would *in place* modification be of any importance.
SilentGhost
I prefer the first approach because I can use a try clause inside the for loop to be sure to finish the loop. mycond() could raise an error.
yuri
In place modification can be important in lower-level languages, for performance and memory reasons. But it's usually irrelevant in Python--the overhead of Python itself swamps it.
Glenn Maynard
Note that this solution is O(n^2); for example, create a list `[1,2,3,4]*1000000` and delete all "2" items. A simple list comprehension finishes on my system in one second; these ran for ten seconds before I got tired of waiting and would probably take about fifteen minutes. Don't do this; just do l[:] = [i for i in self.list if not i.mycond()].
Glenn Maynard
I think I understand but then I've got another question: http://stackoverflow.com/questions/2233975/can-i-catch-error-in-a-list-comprehensions-to-be-sure-to-loop-all-the-list-items
yuri
A: 

It's perfectly safe to shorten the list in place if you do it in reverse!

>>> a=range(20)
>>> for i in reversed(range(len(a))):
...     if a[i]%2: del a[i]
... 
>>> a
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Another way is to reassign the whole slice

>>> a=range(20)
>>> a[:]=(x for x in a if not x%2)
>>> a
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

If the items in the list are unique, this works too

>>> a=range(20)
>>> for item in reversed(a):
...  if item%2: a.remove(item)
... 
>>> a
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Here is some more explanation in response to yuri's comment

Suppose we have

>>> a=[0,1,2,3,4,5]

Now trying naively to delete the 3rd and 4th items

>>> del a[3]
>>> del a[4]
>>> a
[0, 1, 2, 4] # didn't work because the position of all the item with index >=3 was changed

However if we do the del's in the opposite order

>>> a=[0,1,2,3,4,5]
>>> del a[4]
>>> del a[3]
>>> a
[0, 1, 2, 5] # this is the desired result

Now extend that idea over a for loop with a removal condition, and you see that removal from the live list is possible

gnibbler
I suppose first and third example works as expected because reversed() is duplicating the list internally. So you are not shortening the original one directly
yuri
@yuri, reversed on a list doesn't copy the list, it returns a listreverseiterator object. You are in fact shortening the list as you iterate over it, but since you start removing from the end of the list, the index always points to the stable part of the list
gnibbler
@yuri, I tried to add some more explanation. Assuming your list items are unique, changing your code to `for object in reversed(self.list):` will work fine - and it doesn't make a copy the list
gnibbler
A: 
indeces = []
minus = 0

for i in range(self.list):
    if cond(self.list[i]):
        indeces.append(i)

for i in indeces:
    self.list = self.list[:(i-minus)].extend(self.list[i-minus+1:])
inspectorG4dget