views:

258

answers:

4

I was doing a foolish thing like

from itertools import *
rows = combinations(range(0, 1140), 17)
all_rows = []
for row in rows:
    all_rows.append(row)

No surprise I run out of memory address space (32 bit python 3.1) My question i, how do I calculate how much memory address space I will need for a large list? In this case the list is on the order of 2.3X10^37. I assume there is a function in python that returns the information I am looking for, or actually the size of a smaller but similar list. What are those tools?

+1  A: 

You are asking for

http://en.wikipedia.org/wiki/Binomial_coefficient

http://www.brpreiss.com/books/opus7/programs/pgm14_10.txt

Anyhow, sounds like you are trying to solve an NP-complete problem by brute force ;)

Hamish Grubijan
It is an NP complete problem but mostly I am trying to learn a little and maybe stumble upon an answer. The question is stated herehttp://blog.computationalcomplexity.org/2009/11/17x17-challenge-worth-28900-this-is-not.html
Vincent
I would like the list of actual combination, I know how many there are.
Vincent
+2  A: 

Well first rather than writing:

all_rows = []
for row in rows:
    all_rows.append(row)

You can simply write:

all_rows = list(rows)

which will be quite a bit more efficient.

Then, there are two things to consider for memory consumption of a list:

  • memory consumption of the objects comprising the list; this obviously depends on these objects, their type, and whether there is a lot of sharing or not
  • memory consumption of the list itself; each object in the list is referenced by a pointer, which will take 4 bytes in 32-bit mode and 8 bytes in 64-bit mode; so, roughly, the size of the list itself is (4 or 8 bytes) times the number of objects in the list (that's ignoring the fixed list header overhead and the moderate amount of over-allocation that Python lists do)

By the way, in recent Python versions you can use sys.getsizeof() to get the size of an object:

>>> import sys
>>> sys.getsizeof([None] * 100)
872
Antoine P.
Thanks for the hints.
Vincent
+5  A: 

There's a handy function sys.getsizeof() (since Python 2.6) that helps with this:

>>> import sys
>>> sys.getsizeof(1)  # integer
12
>>> sys.getsizeof([]) # empty list
36
>>> sys.getsizeof(()) # empty tuple
28
>>> sys.getsizeof((1,))  # tuple with one element
32

From that you can see that each integer takes up 12 bytes, and the memory for each reference in a list or tuple is 4 bytes (on a 32-bit machine) plus the overhead (36 or 28 bytes respectively).

If your result has tuples of length 17 with integers, then you'd have 17*(12+4)+28 or 300 bytes per tuple. The result itself is a list, so 36 bytes plus 4 per reference. Find out how long the list will be (call it N) and you have 36+N*(4+300) as the total bytes required.

Edit: There's one other thing that could significantly affect that result. Python creates new integer objects as required for most integer values, but for small ones (empirically determined to be the range [-5,256] on Python 2.6.4 on Windows) it pre-creates them all and re-uses them. If a large portion of your values are less than 257 this would significantly reduce the memory consumption. (On Python 257 is not 257+0 ;-)).

Peter Hansen
Note also a newly open-sourced package from Slide Inc: http://github.com/slideinc/meminfo which is a "C extension for finding precise in-memory sizes of python objects".
Peter Hansen
+1  A: 

Addendum: Since you're dealing with lists of integers and worry about memory usage --- there is also the array-module:

[array] defines an object type which can compactly represent an array of basic values: characters, integers, floating point numbers. Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained. The type is specified at object creation time [...].

Alex Brasetvik