views:

1013

answers:

5

A coworker recently wrote a program in which he used a Python list as a queue. In other words, he used .append(x) when needing to insert items and .pop(0) when needing to remove items.

I know that Python has collections.deque and I'm trying to figure out whether to spend my (limited) time to rewrite this code to use it. Assuming that we perform millions of appends and pops but never have more than a few thousand entries, will his list usage be a problem?

In particular, will the underlying array used by the Python list implementation continue to grow indefinitely have millions of spots even though the list only has a thousand things, or will Python eventually do a realloc and free up some of that memory?

+1  A: 

it sounds like a bit of empirical testing might be the best thing to do here - second order issues might make one approach better in practice, even if it's not better in theory.

Peter
+6  A: 

You won't run out of memory using the list implementation, but performance will be poor. From the docs:

Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation.

So using a deque will be much faster.

John Millikin
"much faster"? Or potentially faster?
S.Lott
For lists of size 1000, 10x. More than an order of magnitude is "much faster" in my book.
Ants Aasma
Lott: popping from a list is O(N), from a deque is O(1).
John Millikin
The original question was on garbage collection -- not speed. Regarding speed, if the list stays short (under a dozen items), perhaps the **O**(*n*) issue isn't 10x.
S.Lott
CPython doesn't use garbage collection, it uses reference counting. Performance will be directly related to how often memory re-allocations are performed.
John Millikin
Unless I'm missing something, switching to a deque shouldn't require a lot of work compared to, say, posting this question.
Seun Osewa
@John, you're very deeply wrong: CPython uses BOTH reference counting AND mark-and-sweep, generational garbage collection (which you can control to some extent via the gc module in the standard library). So "CPython doesn't use garbage collection" is a deeply flawed statement indeed.
Alex Martelli
@S.Lott, OP says "never more than a few thousands" and you think this translates to "under a dozen"?! Must be VERY small thousands indeed, what?-)
Alex Martelli
@Alex: The garbage collector in `gc` is optional, and might not be enabled; relying on its presence is a bad idea. Reference counting is the only memory management guaranteed to be present when running in CPython.
John Millikin
You might actually run out of memory using a list. Deque's are allocated in buckets which do not have to be contiguous with each other, so basically you can create a deque as large as your available memory. Lists, however, are arrays, and must be allocated contiguously, which you might find causing you some trouble if they get to be megabytes in size (and they may, at the very least, cause serious memory fragmentation due to reallocation).
Nick Bastin
+1  A: 

Every .pop(0) takes N steps, since the list has to be reorganized. The required memory will not grow endlessly and only be as big as required for the items that are held.

I'd recommend using deque to get O(1) append and pop from front.

bayer
+4  A: 

Some answers claimed a "10x" speed advantage for deque vs list-used-as-FIFO when both have 1000 entries, but that's a bit of an overbid:

$ python -mtimeit -s'q=range(1000)' 'q.append(23); q.pop(0)'
1000000 loops, best of 3: 1.24 usec per loop
$ python -mtimeit -s'import collections; q=collections.deque(range(1000))' 'q.append(23); q.popleft()'
1000000 loops, best of 3: 0.573 usec per loop

python -mtimeit is your friend -- a really useful and simple micro-benchmarking approach! With it you can of course also trivially explore performance in much-smaller cases:

$ python -mtimeit -s'q=range(100)' 'q.append(23); q.pop(0)'
1000000 loops, best of 3: 0.972 usec per loop
$ python -mtimeit -s'import collections; q=collections.deque(range(100))' 'q.append(23); q.popleft()'
1000000 loops, best of 3: 0.576 usec per loop

(not very different for 12 instead of 100 items btw), and in much-larger ones:

$ python -mtimeit -s'q=range(10000)' 'q.append(23); q.pop(0)'
100000 loops, best of 3: 5.81 usec per loop
$ python -mtimeit -s'import collections; q=collections.deque(range(10000))' 'q.append(23); q.popleft()'
1000000 loops, best of 3: 0.574 usec per loop

You can see that the claim of O(1) performance for deque is well founded, while a list is over twice as slow around 1,000 items, an order of magnitude around 10,000. You can also see that even in such cases you're only wasting 5 microseconds or so per append/pop pair and decide how significant that wastage is (though if that's all you're doing with that container, deque has no downside, so you might as well switch even if 5 usec more or less won't make an important difference).

Alex Martelli
+1  A: 

From Beazley's Python Essential Reference, Fourth Edition, p. 194:

Some library modules provide new types that outperform the built-ins at certain tasks. For instance, collections.deque type provides similar functionality to a list but has been highly optimized for the insertion of items at both ends. A list, in contrast, is only efficient when appending items at the end. If you insert items at the front, all of the other elements need to be shifted in order to make room. The time required to do this grows as the list gets larger and larger. Just to give you an idea of the difference, here is a timing measurement of inserting one million items at the front of a list and a deque:

And there follows this code sample:

>>> from timeit import timeit
>>> timeit('s.appendleft(37)', 'import collections; s = collections.deque()', number=1000000)
0.13162776274638258
>>> timeit('s.insert(0,37)', 's = []', number=1000000)
932.07849908298408

Timings are from my machine.

hughdbrown