ansaurus

Question

lambda versus list comprehension performance

Answer 1

+1 A:

Your list comprehension and lambda are doing different things, the list comprehension matching the lambda would be [val for val in T if val in S].

Efficiency isn't the reason why list comprehension are preferred (while they actually are slightly faster in almost all cases). The reason why they are preferred is readability.

Try it with smaller loop body and larger loops, like make T a set, and iterate over S. In that case on my machine the list comprehension is nearly twice as fast.

Ants Aasma 2009-10-27 19:04:52

Answer 2

+9 A:

Your tests are doing very different things. With S being 1000000 elements and T being 300:

[x for x in S for y in T if x==y]= 54.875

This option does 300000000 equality comparisons.

filter(lambda x:x in S,T)= 0.391000032425

This option does 300 linear searches through S.

[val for val in S if val in T]= 12.6089999676

This option does 1000000 linear searches through T.

list(set(S) & set(T))= 0.125

This option does two set constructions and one set intersection.

The differences in performance between these options is much more related to the algorithms each one is using, rather than any difference between list comprehensions and lambda.

Greg Hewgill 2009-10-27 19:05:46

Ah, good point. Order(?) type stuff. Where would be the best place to read up on the O() calcs of these types of operations?

TallPaul 2009-10-27 19:13:44

The O of the first 3 cases is the same - O(n*m)=O(m*n) however some of them are doing more work in Python and others are doing more work n C.

gnibbler 2009-10-27 19:28:58

As it turns out, all of those (except the last one) are doing 300000000 "operations". From a theoretical perspective, they could be considered the same. The differences come more from a computer engineering than a computer science perspective: They have to do more with the choices you've made in the way you look through each list. Experimentation and a good grounding in what each expression is actually doing will be most helpful here.

Greg Hewgill 2009-10-27 19:30:36

Answer 3

+2 A:

Sets are the correct solution for this. However try swapping S and T and see how long it takes!

filter(lambda x:x in T,S)

$ python -m timeit -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in S,T)'
10 loops, best of 3: 485 msec per loop
$ python -m timeit -r1 -n1 -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in T,S)'
1 loops, best of 1: 19.6 sec per loop

So you see that the order of S and T are quite important

Changing the order of the list comprehension to match the filter gives

$ python -m timeit  -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' '[x for x in T if x in S]'
10 loops, best of 3: 441 msec per loop

So if fact the list comprehension is slightly faster than the lambda on my computer

gnibbler 2009-10-27 19:08:31

time diff filter(lambda x:x in T,S)= 12.6879999638Thanks! The light switch got turned on with Greg's post. I think I need to read up on how the operations are performed.

TallPaul 2009-10-27 19:19:50

This doesn't answer his question--he's not asking about sets vs. the others. He probably shouldn't have confused the question by including them.

Glenn Maynard 2009-10-27 19:21:33

@Glenn, I thought the question was more about when/if lambda is faster than other approaches. I think it's wuite importnat if there are cases where lambda is say 10% faster than the alternatives

gnibbler 2009-10-27 19:25:31

Answer 4

+3 A:

First of all, test like this:

import timeit

S=[x for x in range(10000)]
T=[y**2 for y in range(30)]

print "v1", timeit.Timer('[x for x in S for y in T if x==y]',
       'from __main__ import S,T').timeit(100)
print "v2", timeit.Timer('filter(lambda x:x in S,T)',
       'from __main__ import S,T').timeit(100)
print "v3", timeit.Timer('[val for val in T if val in S]',
       'from __main__ import S,T').timeit(100)
print "v4", timeit.Timer('list(set(S) & set(T))',
       'from __main__ import S,T').timeit(100)

And basically you are doing different things each time you test. When you would rewrite the list-comprehension for example as

[val for val in T if val in S]

performance would be on par with the 'lambda/filter' construct.

ChristopheD 2009-10-27 19:08:58

Thanks for the note on timeit, I'll remember that!

TallPaul 2009-10-27 20:20:47

Answer 5

+1 A:

Your profiling is done wrong. Take a look the timeit module and try again.

lambda defines anonymous functions. Their main problem is that many people don't know the whole python library and use them to re-implement functions that are already in the operator, functools etc module ( and much faster ).

List comprehensions have nothing to do with lambda. They are equivalent to the standard filter and map functions from functional languages. LCs are preferred because they can be used as generators too, not to mention readability.

THC4k 2009-10-27 19:12:38

Thanks THC4k! I'll spend more time looking into generators!

TallPaul 2009-10-28 01:26:27

Answer 6

A:

This is pretty fast:

def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if midval < x:
            lo = mid+1
        elif midval > x: 
            hi = mid
        else:
            return mid
    return -1

time1 = time.time()
N = [x for x in T if binary_search(S, x) >= 0]
time2 = time.time()
print 'time diff binary search=', time2-time1

Simply: less comparisions, less time.

Harriv 2009-10-27 19:15:10

Binary search copy-pasted from here: http://stackoverflow.com/questions/212358/binary-search-in-python

Harriv 2009-10-27 19:15:35

Also note that S should be ordered list when using binary search.

Harriv 2009-10-27 19:16:15

Python already has the bisect module with that code in it

gnibbler 2009-10-27 19:41:45

Answer 7

+7 A:

When I fix your code so that the list comprehension and the call to filter are actually doing the same work things change a whole lot:

import time

S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in T if x in S]
time2 = time.time()
print 'time diff [x for x in T if x in S]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N

Then the output is more like:

time diff [x for x in T if x in S]= 0.414485931396
time diff filter(lambda x:x in S,T)= 0.466315984726

So the list comprehension has a time that's generally pretty close to and usually less than the lambda expression.

The reason lambda expressions are being phased out is that many people think they are a lot less readable than list comprehensions. I sort of reluctantly agree.

Omnifarious 2009-10-27 19:19:22

+1: fewer (slow) lambdas would be better.

S.Lott 2009-10-27 20:19:44

Thanks for the "N=[x for x in T if x in S]" re-write, and for the timing. Makes sense!

TallPaul 2009-10-27 20:46:50

Answer 8

A:

List comprehensions can make a bigger difference if you have to process your filtered results. In your case you just build a list, but if you had to do something like this:

n = [f(i) for i in S if some_condition(i)]

you would gain from LC optimization over this:

n = map(f, filter(some_condition(i), S))

simply because the latter has to build an intermediate list (or tuple, or string, depending on the nature of S). As a consequence you will also notice a different impact on the memory used by each method, the LC will keep lower.

The lambda in itself does not matter.

RedGlyph 2009-10-27 19:54:35

Thanks, RedGlyph! I'll try some tests on filtering.

TallPaul 2009-10-28 01:27:49

Answer 9

+1 A:

Many people have already pointed out that you're comparing apples with oranges, etc, etc. But I think nobody's shown how to a really simple comparison -- list comprehension vs map plus lambda with little else to get in the way -- and that might be:

$ python -mtimeit -s'L=range(1000)' 'map(lambda x: x+1, L)'
1000 loops, best of 3: 328 usec per loop
$ python -mtimeit -s'L=range(1000)' '[x+1 for x in L]'
10000 loops, best of 3: 129 usec per loop

Here, you can see very sharply the cost of lambda -- about 200 microseconds, which in the case of a sufficiently simple operation such as this one swamps the operation itself.

Numbers are very similar with filter of course, since the problem is not filter or map, but rather the lambda itself:

$ python -mtimeit -s'L=range(1000)' '[x for x in L if not x%7]'
10000 loops, best of 3: 162 usec per loop
$ python -mtimeit -s'L=range(1000)' 'filter(lambda x: not x%7, L)'
1000 loops, best of 3: 334 usec per loop

No doubt the fact that lambda can be less clear, or its weird connection with Sparta (Spartans had a Lambda, for "Lakedaimon", painted on their shields -- this suggests that lambda is pretty dictatorial and bloody;-) have at least as much to do with its slowly falling out of fashion, as its performance costs. But the latter are quite real.

Alex Martelli 2009-10-27 20:32:44

Thanks, Alex! That's quite useful!

TallPaul 2009-10-27 21:04:23

Answer 10

+4 A:

Q: Why is lambda etc being pushed aside?

A: List comprehensions and generator expressions are generally considered to be a nice mix of power and readability. The pure functional-programming style where you use map(), reduce(), and filter() with functions (often lambda functions) is considered not as clear. Also, Python has added built-in functions that nicely handle all the major uses for reduce().

Suppose you wanted to sum a list. Here are two ways of doing it.

lst = range(10)
print reduce(lambda x, y: x + y, lst)

print sum(lst)

Sign me up as a fan of sum() and not a fan of reduce() to solve this problem. Here's another, similar problem:

lst = range(10)
print reduce(lambda x, y: bool(x or y), lst)

print any(lst)

Not only is the any() solution easier to understand, but it's also much faster; it has short-circuit evaluation, such that it will stop evaluating as soon as it has found any true value. The reduce() has to crank through the entire list. This performance difference would be stark if the list was a million items long, and the first item evaluated true. By the way, any() was added in Python 2.5; if you don't have it, here is a version for older versions of Python:

def any(iterable):
    for x in iterable:
        if x:
            return True
    return False

Suppose you wanted to make a list of squares of even numbers from some list.

lst = range(10)
print map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst))

print [x**2 for x in lst if x % 2 == 0]

Now suppose you wanted to sum that list of squares.

lst = range(10)
print sum(map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst)))

# list comprehension version of the above
print sum([x**2 for x in lst if x % 2 == 0])

# generator expression version; note the lack of '[' and ']'
print sum(x**2 for x in lst if x % 2 == 0)

The generator expression actually just returns an iterable object. sum() takes the iterable and pulls values from it, one by one, summing as it goes, until all the values are consumed. This is the most efficient way you can solve this problem in Python. In contrast, the map() solution, and the equivalent solution with a list comprehension inside the call to sum(), must first build a list; this list is then passed to sum(), used once, and discarded. The time to build the list and then delete it again is just wasted. (EDIT: and note that the version with both map and filter must build two lists, one built by filter and one built by map; both lists are discarded.)

By composing map(), filter(), and reduce() in combination with lambda functions, you can do many powerful things. But Python has idiomatic ways to solve the same problems which are simultaneously better performing and easier to read and understand.

steveha 2009-10-27 21:15:51

Steve, thanks! I learned a lot from this simple example and from your great teaching. It's going to be interesting to learn the best and most efficient ways of doing things in Python. It reminds me somewhat of APL in it's constructs, and I think trying to 'master' Python might be a good thing for to spend time doing.

TallPaul 2009-10-28 03:04:15

You're welcome. :-)

steveha 2009-10-28 19:38:56

ansaurus

tags:

views:

answers:

lambda versus list comprehension performance

[x for x in S for y in T if x==y]= 54.875

filter(lambda x:x in S,T)= 0.391000032425

[val for val in S if val in T]= 12.6089999676

list(set(S) & set(T))= 0.125

related questions