ansaurus

Question

Python 3: Most efficient way to create a [func(i) for i in range(N)] list comprehension

Answer 1

A:

Using list comprehension to accomplish what you're trying to do would be more pythonic way to do it. Despite performance penalty:).

pajton 2010-03-13 21:11:55

Answer 2

+3 A:

There is no difference in computational complexity between using an autoresizing array and preallocating an array. At worst, it costs about O(2N). See here:

http://stackoverflow.com/questions/200384/constant-amortized-time/249695#249695

The cost of the function calls plus whatever happens in your function is going to make this extra n insignificant. This isn't something you should worry about. Just use the list comprehension.

Matt Anderson 2010-03-13 21:14:54

O(2n), of course, isn't a thing.

Mike Graham 2010-03-13 21:31:53

Of course. 2N --> O(N).

Matt Anderson 2010-03-13 21:35:11

Sure it's a thing. O(2n) is equal to O(n).

jleedev 2010-03-13 21:36:17

I didn't know it was that cheap, thanks for the insight, I will look at it! But the constant overhead of allocating is still something I seek to avoid.

mejiwa 2010-03-13 21:44:15

Answer 3

+2 A:

Going to have to disagree with Javier here...

With the following code:

print '%5s | %6s | %6s' % ('N', 'l.comp', 'manual')
print '------+--------+-------'
for N in 10, 100, 1000, 10000:
    num_iter = 1000000 / N

    # Time for list comprehension.
    t1 = timeit.timeit('[func(i) for i in range(N)]', setup='N=%d;func=lambda x:x' % N, number=num_iter)

    # Time to build manually.
    t2 = timeit.timeit('''mylist = [None]*N
for i in range(N): mylist[i] = func(i)''', setup='N=%d;func=lambda x:x' % N, number=num_iter)

    print '%5d | %2.4f | %2.4f' % (N, t1, t2)

I get the following table:

    N | l.comp | manual
------+--------+-------
   10 | 0.3330 | 0.3597
  100 | 0.2371 | 0.3138
 1000 | 0.2223 | 0.2740
10000 | 0.2185 | 0.2771

From this table it appears that the list comprehension faster than pre-allocation in every case of these varying lengths.

Mike Boers 2010-03-13 21:29:04

Answer 4

+3 A:

If you use the timeit module, you may come to the opposite conclusion: list comprehension is faster than preallocation:

f=lambda x: x+1
N=1000000
def lc():
    return [f(i) for i in range(N)]
def prealloc():
    mylist = [None]*N
    for i in range(N): mylist[i]=f(i)
    return mylist
def xr():
    return map(f,xrange(N))

if __name__=='__main__':
    lc()

Warning: These are the results on my computer. You should try these tests yourself, as your results may be different depending on your version of Python and your hardware. (See the comments.)

% python -mtimeit -s"import test" "test.prealloc()"
10 loops, best of 3: 370 msec per loop
% python -mtimeit -s"import test" "test.lc()"
10 loops, best of 3: 319 msec per loop
% python -mtimeit -s"import test" "test.xr()"
10 loops, best of 3: 348 msec per loop

Note that unlike Javier's answer, I include mylist = [None]*N as part of the code timeit is to time when using the "pre-allocation" method. (It's not just part of the setup, since it is code one would need only if using pre-allocation.)

PS. the time module (and time (unix) command) can give unreliable results. If you wish to time Python code, I'd suggest sticking with the timeit module.

unutbu 2010-03-13 21:29:21

This is a lot faster `map(f, xrange(N))`.

jleedev 2010-03-13 21:33:13

@jleedev: I added `xr` to test `map(f,xrange(N))`. It appears with CPython 2.6, list comprehension is faster than map.

unutbu 2010-03-13 21:37:41

Strange, I'm using cpython 2.6.1 (on darwin), and `xr` is about 20% faster than `lc`.

jleedev 2010-03-13 21:40:28

@jleedev: Hm, that is interesting. Perhaps that's a good reminder that what is considered fastest can depend on many things, including hardware.

unutbu 2010-03-13 21:45:31

Just tried it on 2.5.2 in Linux on a Pentium III, and got 3/2.63/2.82 seconds, respectively. Like many questions, the answer is "it depends", although manually preallocating an array in Python is generally not useful.

jleedev 2010-03-13 21:49:32

prealloc/lc/xr - `240/192/148` (python 2.6.4), `244/216/200` (3.1.1) In both version `map` is slightly faster.

J.F. Sebastian 2010-03-13 22:00:52

Answer 5

+1 A:

Interesting question. As of the following test, it seems that preallocation does not improve performance in the current CPython implementation (Python 2 code but result ranking is the same, except that there's no xrange in Python 3):

N = 5000000

def func(x):
    return x**2

def timeit(fn):
    import time
    begin = time.time()
    fn()
    end = time.time()
    print "%-18s: %.5f seconds" % (fn.__name__, end - begin)

def normal():
    mylist = [func(i) for i in range(N)]

def normalXrange():
    mylist = [func(i) for i in xrange(N)]

def pseudoPreallocated():
    mylist = [None] * N
    for i in range(N): mylist[i] = func(i)

def preallocated():
    mylist = [None for i in range(N)]
    for i in range(N): mylist[i] = func(i)

def listFromGenerator():
    mylist = list(func(i) for i in range(N))

def lazy():
    mylist = (func(i) for i in xrange(N))



timeit(normal)
timeit(normalXrange)
timeit(pseudoPreallocated)
timeit(preallocated)
timeit(listFromGenerator)
timeit(lazy)

Results (ranking in parentheses):

normal            : 7.57800 seconds (2)
normalXrange      : 7.28200 seconds (1)
pseudoPreallocated: 7.65600 seconds (3)
preallocated      : 8.07800 seconds (5)
listFromGenerator : 7.84400 seconds (4)
lazy              : 0.00000 seconds

but with psyco.full():

normal            : 7.25000 seconds  (3)
normalXrange      : 7.26500 seconds  (4)
pseudoPreallocated: 6.76600 seconds  (1)
preallocated      : 6.96900 seconds  (2)
listFromGenerator : 10.50000 seconds (5)
lazy              : 0.00000 seconds

As you can see, pseudo-preallocation is faster with psyco. In any case, there's not much of a difference between the xrange solution (which I'd recommend) and the other solutions. If you don't process all elements of the list later, you could also use the lazy method (shown in the code above) which will create a generator that produces elements by the time you iterate over it.

AndiDog 2010-03-13 21:32:22

I think in Python 3 range is the same as xrange, and xrange was dropped, so I'm already using xrange somehow. And for my needs I need access to all elements in the list, I even rely on the execution of func(i). But thank's for all these benchmarks, really informative.

mejiwa 2010-03-13 21:41:31

@mejiwa: Yes, Python 3's range is now similar to Python 2's xrange. Added a comment in my answer. If you compare the benchmark with Python 3.1 you'll notice that it only needs 2-3 seconds instead of 6-7 which shows how much the implementation was improved. Maybe you want to try Unladen Swallow (might be even faster).

AndiDog 2010-03-13 21:46:27

The motivation for my question was more to know what strategy to choose in my future code, more or less regardless of the python implementation, rather than optimizing a single problem. Actually I can't even choose the implementation because it's for a blender export script.But Unladen Swallow seems interesting for personal use projects, thanks for the hint :)

mejiwa 2010-03-13 21:56:15

Answer 6

+5 A:

Somebody wrote: """Python is smart enough. As long as the object you're iterating over has a __len__ or __length_hint__ method, Python will call it to determine the size and preallocate the array."""

As far as I can tell, there is no preallocation in a list comprehension. Python has no way of telling from the size of the INPUT what the size of the OUTPUT will be.

Look at this Python 2.6 code:

>>> def foo(func, iterable):
...     return [func(i) for i in iterable]
...
>>> import dis; dis.dis(foo)
  2           0 BUILD_LIST               0 #### build empty list
              3 DUP_TOP
              4 STORE_FAST               2 (_[1])
              7 LOAD_FAST                1 (iterable)
             10 GET_ITER
        >>   11 FOR_ITER                19 (to 33)
             14 STORE_FAST               3 (i)
             17 LOAD_FAST                2 (_[1])
             20 LOAD_FAST                0 (func)
             23 LOAD_FAST                3 (i)
             26 CALL_FUNCTION            1
             29 LIST_APPEND      #### stack[-2].append(stack[-1]); pop()
             30 JUMP_ABSOLUTE           11
        >>   33 DELETE_FAST              2 (_[1])
             36 RETURN_VALUE

It just builds an empty list, and appends whatever the iteration delivers.

Now look at this code, which has an 'if' in the list comprehension:

>>> def bar(func, iterable):
...     return [func(i) for i in iterable if i]
...
>>> import dis; dis.dis(bar)
  2           0 BUILD_LIST               0
              3 DUP_TOP
              4 STORE_FAST               2 (_[1])
              7 LOAD_FAST                1 (iterable)
             10 GET_ITER
        >>   11 FOR_ITER                30 (to 44)
             14 STORE_FAST               3 (i)
             17 LOAD_FAST                3 (i)
             20 JUMP_IF_FALSE           17 (to 40)
             23 POP_TOP
             24 LOAD_FAST                2 (_[1])
             27 LOAD_FAST                0 (func)
             30 LOAD_FAST                3 (i)
             33 CALL_FUNCTION            1
             36 LIST_APPEND
             37 JUMP_ABSOLUTE           11
        >>   40 POP_TOP
             41 JUMP_ABSOLUTE           11
        >>   44 DELETE_FAST              2 (_[1])
             47 RETURN_VALUE
>>>

The same code, plus some code to avoid the LIST_APPEND.

In Python 3.X, you need to dig a little deeper:

>>> import dis
>>> def comprehension(f, iterable): return [f(i) for i in iterable]
...
>>> dis.dis(comprehension)
  1           0 LOAD_CLOSURE             0 (f)
              3 BUILD_TUPLE              1
              6 LOAD_CONST               1 (<code object <listcomp> at 0x00C4B8D
8, file "<stdin>", line 1>)
              9 MAKE_CLOSURE             0
             12 LOAD_FAST                1 (iterable)
             15 GET_ITER
             16 CALL_FUNCTION            1
             19 RETURN_VALUE
>>> dis.dis(comprehension.__code__.co_consts[1])
  1           0 BUILD_LIST               0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                18 (to 27)
              9 STORE_FAST               1 (i)
             12 LOAD_DEREF               0 (f)
             15 LOAD_FAST                1 (i)
             18 CALL_FUNCTION            1
             21 LIST_APPEND              2
             24 JUMP_ABSOLUTE            6
        >>   27 RETURN_VALUE
>>>

It's the same old schtick: start off with building an empty list, then iterate over the iterable, appending to the list as required. I see no preallocation here.

The optimisation that you are thinking about is used inside a single opcode e.g. the implementation of list.extend(iterable) can preallocate if iterable can accurately report its length. list.append(object) is given a single object, not an iterable.

John Machin 2010-03-13 23:54:40

Thanks for showing me how to disassemble in python, that will help me a lot in the future. But it seems you're using Python 2, in Python 3 I get different output. It doesn't fit into this comment, I will edit my question shortly to show my findings.

mejiwa 2010-03-14 12:04:50

Ah, gotcha! Ok, that convinced me, flagging your answer as accepted answer. As for the "I assume", "probably", "in fact" things: Seems I have to polish my english a bit :-) I will remove the misleading information I added.

mejiwa 2010-03-14 14:45:25

@Daniel: glad you're still here, thought I'v scared you away by accepting this answer.@John: Just to be 100% sure: Do you have any evidence that list.extend(iterable) actually preallocates if the iterable can report its length? No prob if you can't, it's great already that you persisted to make me accept the correct answer :)

mejiwa 2010-03-14 15:12:33

@mejiwa: `http://svn.python.org/view/python/branches/py3k/Objects/listobject.c?view=markup` Ctrl-F search for `_PyObject_LengthHint`

John Machin 2010-03-14 15:27:02

@Daniel: anonymised ;-)

John Machin 2010-03-14 15:29:53

@John: perfect!

mejiwa 2010-03-14 15:35:10

ansaurus

tags:

views:

answers:

Python 3: Most efficient way to create a [func(i) for i in range(N)] list comprehension

related questions