views:

433

answers:

7

Hi

Let's suppose that I have a list of elements, and I want to select only some of them, according to a certain function (for example a distance to an other element).

I want to have as a result a list of tuple, with the distance and the element. So, I wrote the following code

result = [ ( myFunction(C), C) for C in originalList if myFunction(C) < limit ]

But myFunction is a very time-consuming function, and the originalList quite big. So doing like that, myFunction will be call twice for every selected element.

So, is there a way to avoid this ??

I have two other possibilities, but they are not so good:

  1. the first one, is to create the unfiltered list

    unfiltered = [ (myFunction(C),C) for C in originalList ]
    

    and then sort it

    result = [ (dist,C) for dist,C in unfiltered if dist < limit ]
    

    but in that case, I duplicate my originalList and waste some memory (the list could be quite big - more than 10,000 elements)

  2. the second one is tricky and not very pythonic, but efficient (the best we can do, since the function should be evaluated once per element). myFunction stores it last
    result in a global variable (lastResult for example), and this value is re-used in the List comprehension

    result = [ (lastResult,C) for C in originalList if myFunction(C) < limit ]
    

Do you have any better idea to achieve that, in an efficient and pythonic way ??

Thanks for your answers.

+8  A: 

Sure, the difference between the following two:

[f(x) for x in list]

and this:

(f(x) for x in list)

is that the first will generate the list in memory, whereas the second is a new generator, with lazy evaluation.

So, simply write the "unfiltered" list as a generator instead. Here's your code, with the generator inline:

def myFunction(x):
    print("called for: " + str(x))
    return x * x

originalList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
limit = 10
result =   [C2 for C2 in ((myFunction(C), C) for C in originalList) if C2[0] < limit]
# result = [C2 for C2 in [(myFunction(C), C) for C in originalList] if C2[0] < limit]

Note that you will not see a difference in the printout from the two, but if you were to look at memory usage, the second statement which is commented out, will use more memory.

To do a simple change to your code in your question, rewrite unfiltered as this:

unfiltered = [ (myFunction(C),C) for C in originalList ]
             ^                                         ^
             +---------- change these to (..) ---------+
                                 |
                                 v
unfiltered = ( (myFunction(C),C) for C in originalList )
Lasse V. Karlsen
Ok, thank you !
ThibThib
What about using iterator?
Selinap
+3  A: 

Don't use a list comprehension; a normal for loop is fine here.

Kiv
@ThibThib did mentioned that his function is quite time consuming. Therefore, loop is very unsuitable.
Selinap
@Selinap: Why is a loop unsuitable? It does the same work as a list comprehension without the confusion of trying to save an intermediate result.
S.Lott
@Selinap, I disagree. By using a regular for loop you can store the result of myFunction(C) and use the result the second time, which is exactly what ThibThib was looking for. That said, I like Lasse V. Karisen's answer best.
tgray
+3  A: 

Just compute the distances beforehand and then filter the results:

with_distances = ((myFunction(C), C) for C in originalList)
result = [C for C in with_distances if C[0] < limit]

Note: instead of building a new list, I use a generator expression to build the distance/element pairs.

unbeknown
Is this different than Lasse V. Karlsen's answer?
Kiv
it's the same answer. Except that there is a small typo. I guess the last line is result = [ C for C in with_distances if C[0] < limit ]Thank you for the answer
ThibThib
Right, correcting it.
unbeknown
A: 

Rather than using a global variable as in your option 2, you could rely on the fact that in Python parameters are passed by object - that is, the object that is passed into your myFunction function is the same object as the one in the list (this isn't exactly the same thing as call by reference, but it's close enough).

So, if your myFunction set an attribute on the object - say, _result - you could filter by that attribute:

result = [(_result, C) for C in originalList if myFunction(C) < limit]

and your myFunction might look like this:

def myFunction(obj):
    obj._result = ... calculation ....
    return obj._result
Daniel Roseman
Wouldn't this require the OP to write their own class? It's my understanding that C is an integer.
tgray
A: 

What's wrong with option 1?

"duplicate my originalList and waste some memory (the list could be quite big - more than 10,000 elements)"

10,000 elements is only 10,000 pointers to tuples that point to existing objects. Think 160K or so of memory. Hardly worth talking about.

S.Lott
yes, of course, it is only 10,000 pointers. But if it's possible to avoid them, I prefer. It's cleaner.
ThibThib
A: 

Some options:

  • Use memoization
  • Use a normal for loop
  • Create an unfiltered list, then filter it (your option 1). The 'wasted' memory will be reclaimed by the GC very quickly - it's not something you need to worry about.
Joe Gauterin
Does Python implement some memoization technique ?Or it's up to the programmer to implement, for example, a cache of the latest results?
ThibThib
It's up to the programmer, but you can write a generic function that takes a function as its argument and returns a memoized version of that function.
Joe Gauterin
+1  A: 

Lasse V. Karlsen has an excellent reply to your question.

If your distance computation is slow, I guess your elements are polylines, or something like that, right ?

There are lots of ways to make it faster :

  • If the distance between bounding boxes of objects is > X, then it follows that the distance between those objects is > X. So you only need to compute distance between bounding boxes.

  • If you want all objects that are at a distance less than X from object A, only objects whose bounding box intersect A's bounding box enlarged by X are potential matches.

Using the second point you can probably drop lots of candidate matches and only do the slow computation when needed.

Bounding boxes must be cached beforehand.

If you really have a lot of objects you could also use space partitioning...

Or convex enclosing polys if you are in 3D

peufeu
Thank you for your answer.I am not using polylines (only points on a sphere, where I am looking for neighbors, by computing the orthodromic distances).And I am already using these technic of computing an upper bound (but fast) of the distance, and using it to drop lots of candidate.The distance function is not very slow (1s to compute (1000^2)/2 distances on my old computer), but I just wanted not to compute the unnecessary distances.
ThibThib