ansaurus

Question

Python - Create a list with initial capacity

Answer 1

+10 A:

Python lists have no built-in pre-allocation. If you really need to make a list, and need to avoid the overhead of appending (and you should verify that you do), you can do this:

l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
    # baz
    l[i] = bar
    # qux

Perhaps you could avoid the list by using a generator instead:

def my_things():
    while foo:
        #baz
        yield bar
        #qux

for thing in my_things():
    # do something with thing

This way, the list isn't every stored all in memory at all, merely generated as needed.

Ned Batchelder 2008-11-22 21:07:18

+1 Generators instead of lists. Many algorithms can be revised slightly to work with generators instead of full-materialized lists.

S.Lott 2008-11-22 21:23:48

generators are a good idea, true. i was wanting a general way to do it besides the setting in-place. i guess the difference is minor, thoguh.

Claudiu 2008-11-23 00:57:38

Answer 2

A:

From what I understand, python lists are already quite similar to ArrayLists. But if you want to tweak those parameters I found this post on the net that may be interesting (basically, just create your own ScalableList extension):

http://mail.python.org/pipermail/python-list/2000-May/035082.html

Piotr Lesnicki 2008-11-22 21:07:56

Answer 3

+19 A:

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

Results. (evaluate each function 144 times and average the duration)

simple append 0.0102
pre-allocate  0.0098

Conclusion. It barely matters.

Premature optimization is the root of all evil.

S.Lott 2008-11-22 22:02:34

fair enough! i did a similar test and found a difference of 1.00 vs 0.8 or so. I guess it doesn't matter so much.

Claudiu 2008-11-23 00:46:08

I agree 100% with this answer. The only thing I would add is that you may get some lift out of using generator expressions like: list(i for i in xrange(size))

Jeremy Cantrell 2008-11-24 18:31:21

@Jeremy Cantrell: Feel free to post your results for comparison.

S.Lott 2008-11-24 18:37:55

What if the preallocation method (size*[None]) itself is inefficient? Does the python VM actually allocate the list at once, or grow it gradually, just like the append() would?

haridsv 2009-11-15 03:01:50

@haridsv: "What if the preallocation method (size*[None]) itself is inefficient?" What does that mean? It's equally efficient. It's not inefficient. It's the same.

S.Lott 2009-11-15 16:39:35

What I was wondering is that if python preallocates the list with the final size or will only grow it on demand (ie., attempt to grow "size" number of times, just like for the regular append()). Just considering all corners... it is simple enough to optimize so it is quite likely that this case is already optimized, but remains an assumption unless someone knows the internals for sure.

haridsv 2009-11-16 20:04:11

@haridsv: (1) I can't understand what hair you're splitting. The time is the same. I don't know what other "optimization" you could be talking about. (2) The internals are trivially available -- you can read the source at any time.

S.Lott 2009-11-17 00:45:35

@haridsv: I made some test by putting the allocation (`result=size*[None]`) outside the function and can't find a significant difference. By replacing the loop body by `result[i] = i`, I then get 1.13 ms versus 0.62 ms with preallocation.

rafak 2010-07-09 21:52:59

@S.Lott: What haridsv is saying is that if int*list is implemented by growing a list one element at a time, then it's no wonder both implementations shown above take the same time. If this were the case, then it is possible that pre-allocating might be much faster - but we haven't yet found any code to test that case.

Tartley 2010-07-14 22:31:15

@Tartley: "haven't yet found any code to test that case". Baffling. If it can't be expressed in Python, then what are we talking about?

S.Lott 2010-07-15 00:15:53

Hey. It presumably can be expressed in Python, but nobody has yet posted it here. haridsv's point was that we're just assuming 'int * list' doesn't just append to the list item by item. That assumption is probably valid, but haridsv's point was that we should check that. If it wasn't valid, that would explain why the two functions you showed take almost identical times - because under the covers, they are doing exactly the same thing, hence haven't actually tested the subject of this question. Best regards!

Tartley 2010-07-16 08:41:54

Answer 4

+1 A:

i ran @s.lott's code and produced the same 10% perf increase by pre-allocating. tried @jeremy's idea using a generator and was able to see the perf of the gen better than that of the doAllocate. For my proj the 10% improvement matters, so thanks to everyone as this helps a bunch.

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

def doGen( size=10000 ):
    return list("some unique object %d" % ( i, ) for i in xrange(size))

size=1000
@print_timing
def testAppend():
    for i in xrange(size):
     doAppend()

@print_timing
def testAlloc():
    for i in xrange(size):
     doAllocate()

@print_timing
def testGen():
    for i in xrange(size):
     doGen()


testAppend()
testAlloc()
testGen()

testAppend took 14440.000ms
testAlloc took 13580.000ms
testGen took 13430.000ms

Jason Wiener 2009-10-21 19:09:38

"For my proj the 10% improvement matters"? Really? You can **prove** that list allocation is **the** bottleneck? I'd like to see more on that. Do you have a blog where you could explain how this actually helped?

S.Lott 2010-09-28 10:17:48

ansaurus

tags:

views:

answers:

Python - Create a list with initial capacity

related questions