views:

1076

answers:

9

The Python list comprehension syntax makes it easy to filter values within a comprehension. For example:

result = [x**2 for x in mylist if type(x) is int]

Will return a list of the squares of integers in mylist. However, what if the test involves some (costly) computation and you want to filter on the result? One option is:

result = [expensive(x) for x in mylist if expensive(x)]

This will result in a list of non-"false" expensive(x) values, however expensive() is called twice for each x. Is there a comprehension syntax that allows you to do this test while only calling expensive once per x?

+8  A: 

Came up with my own answer after a minute of thought. It can be done with nested comprehensions:

result = [y for y in (expensive(x) for x in mylist) if y]

I guess that works, though I find nested comprehensions are only marginally readable

Nick
+6  A: 
result = [x for x in map(expensive,mylist) if x]

map() will return a list of the values of each object in mylist passed to expensive(). Then you can list-comprehend that, and discard unnecessary values.

This is somewhat like a nested comprehension, but should be faster (since the python interpreter can optimize it fairly easily).

Dan Udey
I like this solution. The one problem with it is that map creates a list, so you're going to have a potentially large temporary list that just gets torn down in the end.
Nick
@Nick: if creating a large list is an issue, use `itertools.imap`.
John Millikin
+2  A: 

You could always memoize the expensive() function so that calling it the second time around is merely a lookup for the computed value of x.

Here's just one of many implementations of memoize as a decorator.

yukondude
blasted race conditions ;)
rcreswick
+2  A: 

You could memoize expensive(x) (and if you are calling expensive(x) frequently, you probably should memoize it any way. This page gives an implementation of memoize for python:

http://code.activestate.com/recipes/52201/

This has the added benefit that expensive(x) may be run less than N times, since any duplicate entries will make use of the memo from the previous execution.

Note that this assumes expensive(x) is a true function, and does not depend on external state that may change. If expensive(x) does depend on external state, and you can detect when that state changes, or you know it wont change during your list comprehension, then you can reset the memos before the comprehension.

rcreswick
+7  A: 

If the calculations are already nicely bundled into functions, how about using filter and map?

result = filter (None, map (expensive, mylist))

You can use itertools.imap if the list is very large.

John Millikin
filter(expensive, mylist)in Python 3, to return a list:list(filter(expensive, mylist))
Selinap
+4  A: 

The most obvious (and I would argue most readable) answer is to not use a list comprehension or generator expression, but rather a real generator:

def gen_expensive(mylist):
    for item in mylist:
        result = expensive(item)
        if result:
            yield result

It takes more horizontal space, but it's much easier to see what it does at a glance, and you end up not repeating yourself.

Thomas Wouters
I think this is only worth doing if you're going to gen_expensive() multiple times and, even then, I think filtering the result wins out because you could see what is happening on one line, rather than having to dig up the definition of gen_expensive().
Nick
+2  A: 

This is exactly what generators are suited to handle:

result = (expensive(x) for x in mylist)
result = (do_something(x) for x in result if some_condition(x))
...
result = [x for x in result if x]  # finally, a list
  1. This makes it totally clear what is happening during each stage of the pipeline.
  2. Explicit over implicit
  3. Uses generators everywhere until the final step, so no large intermediate lists

cf: 'Generator Tricks for System Programmers' by David Beazley

Gregg Lind
+1  A: 

I will have a preference for:

itertools.ifilter(bool, (expensive(x) for x in mylist))

This has the advantage to : - avoid None as the function (will be eliminate in py 3): http://bugs.python.org/issue2186 - use only iterators.

odwl
A: 
Paddy3118