views:

109

answers:

4

I've got a list comprehensions which filter a list:

l = [obj for obj in objlist if not obj.mycond()]

but the object method mycond() can raise an Exception I must intercept. I need to collect all the errors at the end of the loop to show which object has created any problems and at the same time I want to be sure to loop all the list elements.

My solution was:

errors = []
copy = objlist[:]

for obj in copy:
    try:
        if (obj.mycond()):
            # avoiding to touch the list in the loop directly
            objlist.remove(obj) 
    except MyException as err:
        errors = [err]
if (errors):
   #do something

return objlist

In this post (How to delete list elements while cycling the list itself without duplicate it) I ask if there is a better method to cycle avoiding the list duplicate.

The community answer me to avoid in place list modification and use a list comprehensions that is applicable if I ignore the Exception problem.

Is there an alternative solution in your point of view ? Can I manage Exception in that manner using list comprehensions? In this kind of situation and using big lists (what I must consider big ?) I must find another alternative ?

A: 

Instead of copying the list and removing elements, start with a blank list and add members as necessary. Something like this:

errors = []
newlist = []

for obj in objlist:
    try:
        if not obj.mycond():
            newlist.append(obj)
    except MyException as err:
        errors.append(err)
if (errors):
   #do something

return newlist

The syntax isn't as pretty, but it'll do more or less the same thing that the list comprehension does without any unnecessary removals.

Adding or removing elements to or from anywhere other than the end of a list will be slow because when you remove something, it needs to go through every item that comes after it and subtract one from its index, and same thing for adding something except it'll need to add to the index. update the position of all the elements after it.

Davy8
It would be clearer to describe insertion and deletion in terms of the list being a series of pointers ("references") to the list items, and all the following pointers needing to be shifted up or down in memory. There is no "index" that needs to be changed for each item (unless by "index" you mean simply "position" but that's by no means clear to everyone reading your answer).
Peter Hansen
@Peter I guess I was trying to explain it from a conceptual level rather than actually how it's (most likely) implemented. If I were to implement a list from an OO perspective without regards to optimizations, that's how I'd implement it.
Davy8
@Davy8: "it needs to go through ..." perhaps you should change that "needs". How would your OO perspective implementation handle `a = b[c]`? Iterate through the `b` list looking for an item whose index is equal to `c`? BTW, in your code where it says `errors = [err]`, don't you mean `errors.append(err)`?
John Machin
@John I agree that needs might be too strong of a word, and I did mean append. I'll fix that. To go along with my proposed hypothetical I suppose I would iterate. If python didn't provide a native list or dict classes (I realize super hypothetical) , how would you implement a list in pure python? (I actually would like to know if there's an effective way to do it without pointer arithmetic, I don't know how)
Davy8
@Peter btw I did change it to say update the position rather than the index, but isn't position always going to be the equal to the index? (or index+1 depending on where you start counting positions) I suppose index is more technical term and more closely tied to the implementation, and position is more of a layman's term that doesn't imply anything about implementation
Davy8
@Davy8, if you think of Python lists as linked lists (which they are not) then it might make sense to discuss an "index" value that is stored in each node, though I've never seen an implementation do that. But Python lists are arrays, and there is *no* "index" value stored anywhere... it's effectively just a calculated value (you can get it with `list.index(item)`) representing the position of an item in the list, but it's not stored anywhere in the list structure so it does not get updated. The position of the references/pointers do get updated though, since they all get shifted in memory.
Peter Hansen
A: 

you could define a method of obj that calls obj.mycond() but also catches the exception

class obj:

    def __init__(self):
        self.errors = []

    def mycond(self):
        #whatever you have here

    def errorcatcher():
        try:
            return self.mycond()
        except MyException as err:
            self.errors.append(err)
            return False # or true, depending upon what you want

l = [obj for obj in objlist if not obj.errorcatcher()]

errors = [obj.errors for obj in objlist if obj.errors]

if errors:
    #do something
pwdyson
It seems you should be saying `[obj for obj in objlist if not obj.errorcatcher()]` ?
gahooa
thanks gahooa, fixed now
pwdyson
+1  A: 

A couple of comments:

First of all, the list comprehension syntax [expression for var in iterable] DOES create a copy. If you do not want to create a copy of the list, then use the generator expression (expression for var in iterable).

How do generators work? Essentially by calling next(obj) on the object repeatedly until a GeneratorExit exception is raised.

Based on your original code, it seems that you are still needing the filtered list as output.

So you can emulate that with little performance loss:

l = []
for obj in objlist:
   try:
      if not obj.mycond()
         l.append(obj)
   except Exception:
      pass

However, you could re-engineer that all with a generator function:

def FilterObj(objlist):
   for obj in objlist:
      try:
         if not obj.mycond()
            yield obj
      except Exception:
         pass

In that way, you can safely iterate over it without caching a list in the meantime:

for obj in FilterObj(objlist):
   obj.whatever()
gahooa
+8  A: 

I would use a little auxiliary function:

def f(obj, errs):
  try: return not obj.mycond()
  except MyException as err: errs.append((obj, err))

errs = []
l = [obj for obj in objlist if f(obj, errs)]
if errs:
  emiterrorinfo(errs)

Note that this way you have in errs all the errant objects and the specific exception corresponding to each of them, so the diagnosis can be precise and complete; as well as the l you require, and your objlist still intact for possible further use. No list copy was needed, nor any changes to obj's class, and the code's overall structure is very simple.

Alex Martelli