views:

853

answers:

6

When programming in Python, is it possible to reserve memory for a list that will be populated with a known number of items, so that the list will not be reallocated several times while building it? I've looked through the docs for a Python list type, and have not found anything that seems to do this. However, this type of list building shows up in a few hotspots of my code, so I want to make it as efficient as possible.

Edit: Also, does it even make sense to do something like this in a language like Python? I'm a fairly experienced programmer, but new to Python and still getting a feel for its way of doing things. Does Python internally allocate all objects in separate heap spaces, defeating the purpose of trying to minimize allocations, or are primitives like ints, floats, etc. stored directly in lists?

A: 

Just create the list at the beginning like this:

a = range(10)

Then you can assign values writing

a[7] = an object
happy_emi
This doesn't work for Python3.0 because *range* creates a range object - not a list. Either use *list(range(10))* or do it how SilentGhost described it.
Pankrat
hanks for pointing that out. I didn't know.
happy_emi
+4  A: 

you can create list of the known length like this:

>>> [None] * known_number
SilentGhost
+4  A: 

In most of everyday code you won't need such optimization.

However, when list efficiency becomes an issue, the first thing you should do is replace generic list with typed one from array module which is much more efficient.

Here's how list of 4 million floating point numbers cound be created:

import array
lst = array.array('f', [0.0]*4000*1000)
Alex Lebedev
What do you mean by "much more efficient"? `array.array` might require less memory but a Python list is faster in most (meaning those I've tried) cases.
J.F. Sebastian
In this case it even creates first a list and then from the list an array. This is not efficient.
Georg
+1  A: 

In python, all objects are allocated on the heap, but python uses a special memory allocator (so malloc won't be called every time you need a new object). There are also some optimizations for small ints and the likes, which are cached, but which values and how is implementation dependent.

David Cournapeau
+1  A: 

If you're wanting to manipulate numbers efficiently in Python then have a look at NumPy ( http://numpy.scipy.org/). It let's you do things extremely fast while still getting to use Python.

To do what your asking in NumPy you'd do something like

import numpy as np
myarray = np.zeros(4000)

which would give you an array of floating point numbers initialized to zero. You can then do very cool things like multiply whole arrays by a single factor or by other arrays and other stuff (kind of like in Matlab if you've ever used that) which is very fast (most of the actual work is happening in the highly optimized C part of the NumPy library).

If it's not arrays of numbers your after then you're probably not going to find a way to do what you want in Python. A Python list of objects is a list of points to objects internally (I think so anyway, I'm not an expert of Python internals) so it would still be allocating each of its members as you create them.

Thomas Parslow
+7  A: 

Here's four variants:

  • an incremental list creation
  • "pre-allocated" list
  • array.array()
  • numpy.zeros()

 

python -mtimeit -s"N=10**6" "a = []; app = a.append;"\
    "for i in xrange(N):  app(i);"
10 loops, best of 3: 390 msec per loop

python -mtimeit -s"N=10**6" "a = [None]*N; app = a.append;"\
    "for i in xrange(N):  a[i] = i"
10 loops, best of 3: 245 msec per loop

python -mtimeit -s"from array import array; N=10**6" "a = array('i', [0]*N)"\
    "for i in xrange(N):" "  a[i] = i"
10 loops, best of 3: 541 msec per loop

python -mtimeit -s"from numpy import zeros; N=10**6" "a = zeros(N,dtype='i')"\
    "for i in xrange(N):" "  a[i] = i"
10 loops, best of 3: 353 msec per loop

It shows that [None]*N is the fastest and array.array is the slowest in this case.

J.F. Sebastian