views:

125

answers:

5

Here is a code snippet that shows the code I would like to optimize:

result = [(item, foo(item))
          for item in item_list
          if cond1(item) and cond2(foo(item))]

In the above snippet I call foo(item) twice. I can't think of a way to iterate over the list only once maintain both item and foo(item) for the conditional and the result list.

That is, I would like to keep item and foo(item) without having to loop over the list twice and without having to call foo(item) twice.

I know I can do it with a second nested list comprehension:

result = [(item, foo_item)
          for item, foo_item in [(i, foo(i)) for i in item_list]
          if cond1(item) and cond2(foo_item)]

but that appears to loop through item_list twice which I would like to avoid.

So the first example calls foo twice per list item. The second example loops through the list twice (or appears to). I'd like to loop one time and call foo once for each item.

+4  A: 

It doesn't, but here:

result = [(item, foo_item)
    for item, foo_item in ((i, foo(i)) for i in item_list)
    if cond1(item) and cond2(foo_item)]

Turning the inner list comprehension into a generator expression makes sure that we don't use an unnecessary temporary list.

Ignacio Vazquez-Abrams
That is identical to my second example except you have changed my list comprehension into a generator expression. Are you saying that my second example would only look through the list once? Or are you saying that in order for it to only loop once I need a generator expression?
James Fassett
A list comprehension creates a list. So you iterate over `item_list`, applying the function to each element in turn, and returning the results as a list. You then iterate over *this* list, making it appear as though you're iterating over `item_list` twice when in fact it's just a side effect of the algorithm.
Ignacio Vazquez-Abrams
The example with inner list will iterate through whole item_list to create a temporary list and then iterate through that list.The example with generator will iterate through item_list and for each item execute the rest of the code - exactly what you asked for.
zvonimir
So there are 2 iterations. Once for `item_list` and once for the new list generated by the inner comprehension. Which is as I suspected. I want to avoid that second iteration so that I only ever iterate over `len(list)` regardless of how many intermediate lists are created.
James Fassett
Which is what converting it to a genex does.
Ignacio Vazquez-Abrams
Is your problem that you _think_ it loops over the list twice, or that it _looks_ like it loops over the loop twice? If the former, I don't think we can help. If the latter, I think the question is a waste of time.
Chris B.
+3  A: 

Like I've been repeatedly told here, the best thing in such cases is not to use a list comprehension at all:

result = []
for item in item_list:
    if cond1(item):
        value = foo(item)
        if cond2(value):
            result.append((item, value))

But I am stubbborn, so let's see what I can come up with (and keep the comprehension) (oh, wait -- I got your code all wrong. Still - unwrapping and having intermediate variables is the straight way for not to repeat the call)

jsbueno
I'm the same. Obviously it can be written efficiently imperatively but I am interested in a functional or list-comprehension based solution.
James Fassett
+1 This is simple, readable and likely to be the fastest. Anyone care to run some benchmarks?
gnibbler
+2  A: 

How does this look?

result = [ (i, fi) for i  in item_list if cond1(i)
                   for fi in (foo(i),) if cond2(fi) ]
Karmastan
That is pretty slick actually. I wonder if the packaging and unpacking of `foo(i)` into a list is expensive. Still it is a clean way to solve the problem. I'll give it some thought.
James Fassett
I would hope that creating a list (or a single-element tuple) uses a highly-optimized code path in the Python interpreter. It has to be faster than creating and iterating through a generator. I welcome comments from someone who's actually benchmarked these suggestions.
Karmastan
Good - I was coming now to retry assigning foo[i] to some "thing" (like a locals().__setitem__ call) but this single-item for is much cleaner. (again, just for the fun of making it in one line)
jsbueno
This calculates `foo(i)` regardless of the result of `cond1(i)`. When `cond1(i)` is `False` you should avoid calculating `foo(i)`
gnibbler
@gnibbler: it's an easy fix. I was trying to preserve the way the poster originally structured the question. You're right, this is the better way to write it.
Karmastan
+1  A: 

Use generator expressions.

result = [(item, foo_item)
          for item, foo_item in ((i, foo(i)) for i in item_list)
          if cond1(item) and cond2(foo_item)]

The interpreter will go through every element exactly once, because generator expression will calculate (i, foo(i)) only when it is required by the outer loop.

Assuming that foo is expensive and has no side effects, I'd even try to do this:

result = [(item, foo_item)
          for item, foo_item in ((i, foo(i)) for i in item_list if cond1(i))
          if cond2(foo_item)]

so that foo will not be called for elements which do not pass the first condition. Actually this looks better for me when written functionally:

from itertools import imap, ifilter
result = filter((lambda i,f:cond2(f)),
           imap((lambda i:(i, foo(i))),
             ifilter(cond1, item_list)))

...but I might be subjective.

liori
It would be natural to move the conditional into the generator expression to reduce it.
James Fassett
A: 

This is one of the many reasons that we have generators:

def generator( items ):
    for item in items:
        if cond1(item):
            food = foo(item)
            if food:
                yield item, food

result = list(generator(item_list))

LCs are only good when they look good - if you have to spread them over 3 lines just to make them readable it's a bad idea.

THC4k