views:

76

answers:

1

I need to process lots of data in lists and so have been looking at what the best way of doing this is using Python.

The main ways I've come up with are using: - List comprehensions - generator expressions - functional style operations (map,filter etc.)

I know generally list comprehensions are probably the most "Pythonic" method, but what is best in terms of performance?

A: 

Inspired by this answer: http://stackoverflow.com/questions/1247486/python-list-comprehension-vs-map , I've tweaked the questions to allow generator expressions to be compared:

For built-ins:

$ python -mtimeit -s 'import math;xs=range(10)' 'sum(map(math.sqrt, xs))'
100000 loops, best of 3: 2.96 usec per loop
$ python -mtimeit -s 'import math;xs=range(10)' 'sum([math.sqrt(x) for x in xs)]'
100000 loops, best of 3: 3.75 usec per loop
$ python -mtimeit -s 'import math;xs=range(10)' 'sum(math.sqrt(x) for x in xs)'
100000 loops, best of 3: 3.71 usec per loop

For lambdas:

$ python -mtimeit -s'xs=range(10)' 'sum(map(lambda x: x+2, xs))'
100000 loops, best of 3: 2.98 usec per loop
$ python -mtimeit -s'xs=range(10)' 'sum([x+2 for x in xs])'
100000 loops, best of 3: 1.66 usec per loop
$ python -mtimeit -s'xs=range(10)' 'sum(x+2 for x in xs)'
100000 loops, best of 3: 1.48 usec per loop

Making a list:

$ python -mtimeit -s'xs=range(10)' 'list(map(lambda x: x+2, xs))'
100000 loops, best of 3: 3.19 usec per loop
$ python -mtimeit -s'xs=range(10)' '[x+2 for x in xs]'
100000 loops, best of 3: 1.21 usec per loop
$ python -mtimeit -s'xs=range(10)' 'list(x+2 for x in xs)'
100000 loops, best of 3: 3.36 usec per loop

It appears that map is best when paired with built-in functions, otherwise, generator expressions beat out list comprehensions. Along with slightly cleaner syntax, generator expressions also save much memory over list comprehensions because they are lazily evaluated. So in the absence of specific tests for your application, you should use map with builtins, a list comprehension when you require a list result, otherwise a generator. If you're really concerned with performance, you might take a look at whether you actually require lists at all points in your program.

gilesc