ansaurus

Question

List Comprehensions in Python : efficient selection in a list

Answer 1

+8 A:

Sure, the difference between the following two:

[f(x) for x in list]

and this:

(f(x) for x in list)

is that the first will generate the list in memory, whereas the second is a new generator, with lazy evaluation.

So, simply write the "unfiltered" list as a generator instead. Here's your code, with the generator inline:

def myFunction(x):
    print("called for: " + str(x))
    return x * x

originalList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
limit = 10
result =   [C2 for C2 in ((myFunction(C), C) for C in originalList) if C2[0] < limit]
# result = [C2 for C2 in [(myFunction(C), C) for C in originalList] if C2[0] < limit]

Note that you will not see a difference in the printout from the two, but if you were to look at memory usage, the second statement which is commented out, will use more memory.

To do a simple change to your code in your question, rewrite unfiltered as this:

unfiltered = [ (myFunction(C),C) for C in originalList ]
             ^                                         ^
             +---------- change these to (..) ---------+
                                 |
                                 v
unfiltered = ( (myFunction(C),C) for C in originalList )

Lasse V. Karlsen 2009-08-03 14:38:39

Ok, thank you !

ThibThib 2009-08-03 14:54:44

What about using iterator?

Selinap 2009-08-03 15:22:41

Answer 2

+3 A:

Don't use a list comprehension; a normal for loop is fine here.

Kiv 2009-08-03 14:39:03

@ThibThib did mentioned that his function is quite time consuming. Therefore, loop is very unsuitable.

Selinap 2009-08-03 15:13:52

@Selinap: Why is a loop unsuitable? It does the same work as a list comprehension without the confusion of trying to save an intermediate result.

S.Lott 2009-08-03 15:38:59

@Selinap, I disagree. By using a regular for loop you can store the result of myFunction(C) and use the result the second time, which is exactly what ThibThib was looking for. That said, I like Lasse V. Karisen's answer best.

tgray 2009-08-03 15:41:01

Answer 3

+3 A:

Just compute the distances beforehand and then filter the results:

with_distances = ((myFunction(C), C) for C in originalList)
result = [C for C in with_distances if C[0] < limit]

Note: instead of building a new list, I use a generator expression to build the distance/element pairs.

unbeknown 2009-08-03 14:42:03

Is this different than Lasse V. Karlsen's answer?

Kiv 2009-08-03 14:46:11

it's the same answer. Except that there is a small typo. I guess the last line is result = [ C for C in with_distances if C[0] < limit ]Thank you for the answer

ThibThib 2009-08-03 14:54:01

Right, correcting it.

unbeknown 2009-08-03 15:02:00

Answer 4

A:

Rather than using a global variable as in your option 2, you could rely on the fact that in Python parameters are passed by object - that is, the object that is passed into your myFunction function is the same object as the one in the list (this isn't exactly the same thing as call by reference, but it's close enough).

So, if your myFunction set an attribute on the object - say, _result - you could filter by that attribute:

result = [(_result, C) for C in originalList if myFunction(C) < limit]

and your myFunction might look like this:

def myFunction(obj):
    obj._result = ... calculation ....
    return obj._result

Daniel Roseman 2009-08-03 14:42:05

Wouldn't this require the OP to write their own class? It's my understanding that C is an integer.

tgray 2009-08-03 15:44:25

Answer 5

A:

What's wrong with option 1?

"duplicate my originalList and waste some memory (the list could be quite big - more than 10,000 elements)"

10,000 elements is only 10,000 pointers to tuples that point to existing objects. Think 160K or so of memory. Hardly worth talking about.

S.Lott 2009-08-03 14:42:59

yes, of course, it is only 10,000 pointers. But if it's possible to avoid them, I prefer. It's cleaner.

ThibThib 2009-08-03 14:50:29

Answer 6

A:

Some options:

Use memoization
Use a normal for loop
Create an unfiltered list, then filter it (your option 1). The 'wasted' memory will be reclaimed by the GC very quickly - it's not something you need to worry about.

Joe Gauterin 2009-08-03 14:52:33

Does Python implement some memoization technique ?Or it's up to the programmer to implement, for example, a cache of the latest results?

ThibThib 2009-08-03 15:16:00

It's up to the programmer, but you can write a generic function that takes a function as its argument and returns a memoized version of that function.

Joe Gauterin 2009-08-07 09:40:15

Answer 7

+1 A:

Lasse V. Karlsen has an excellent reply to your question.

If your distance computation is slow, I guess your elements are polylines, or something like that, right ?

There are lots of ways to make it faster :

If the distance between bounding boxes of objects is > X, then it follows that the distance between those objects is > X. So you only need to compute distance between bounding boxes.
If you want all objects that are at a distance less than X from object A, only objects whose bounding box intersect A's bounding box enlarged by X are potential matches.

Using the second point you can probably drop lots of candidate matches and only do the slow computation when needed.

Bounding boxes must be cached beforehand.

If you really have a lot of objects you could also use space partitioning...

Or convex enclosing polys if you are in 3D

peufeu 2009-08-04 09:37:38

Thank you for your answer.I am not using polylines (only points on a sphere, where I am looking for neighbors, by computing the orthodromic distances).And I am already using these technic of computing an upper bound (but fast) of the distance, and using it to drop lots of candidate.The distance function is not very slow (1s to compute (1000^2)/2 distances on my old computer), but I just wanted not to compute the unnecessary distances.

ThibThib 2009-08-04 10:48:32

ansaurus

tags:

views:

answers:

List Comprehensions in Python : efficient selection in a list

related questions