tags:

views:

236

answers:

4

I was surprised that sys.getsizeof( 10000*[x] ) is 40036 regardless of x: 0, "a", 1000*"a", {}.
Is there a deep_getsizeof which properly considers elements that share memory ?
(The question came from looking at in-memory database tables like range(1000000) -> province names: list or dict ?)
(Python is 2.6.4 on a mac ppc.)

Added: 10000*["Mississippi"] is 10000 pointers to one "Mississippi", as several people have pointed out. Try this:

nstates = [AlabamatoWyoming() for j in xrange(N)]

where AlabamatoWyoming() -> a string "Alabama" .. "Wyoming". What's deep_getsizeof(nstates) ?
(How can we tell ?

  • a proper deep_getsizeof: difficult, ~ gc tracer
  • estimate from total vm
  • inside knowledge of the python implementation
  • guess.

Added 25jan: see also when-does-python-allocate-new-memory-for-identical-strings

+6  A: 

10000 * [x] will produce a list of 10000 times the same object, so the sizeof is actually closer to correct than you think. However, a deep sizeof is very problematic because it's impossible to tell Python when you want to stop the measurement. Every object references a typeobject. Should the typeobject be counted? What if the reference to the typeobject is the last one, so if you deleted the object the typeobject would go away as well? What about if you have multiple (different) objects in the list refer to the same string object? Should it be counted once, or multiple times?

In short, getting the size of a data structure is very complicated, and sys.getsizeof() should never have been added :S

Thomas Wouters
+1 you must define where to stop for any deep-stuff. Do you want to report memory shared by other parts of the code? Then that's almost everything, since it has a reference to `object`.
nosklo
+2  A: 

If you list is only holding objects with the same length you could get a more accurate estimate number by doing this

def getSize(array):
    return sys.getsizeof(array) + len(array) * sys.getsizeof(array[0])

Obviously it's not going to work as good for strings with variable length.

If you only want to calculate the size for debugging or during development and you don't care about the performance, you could iterate over all items recursively and calculation the total size. Note that this solution is not going to handle multiple references to same object correctly.

Nadia Alramli
A: 

mylist = 10000 * [x] means create a list of size 10000 with 10000 references to object x.

Object x is not copied - only a single one exists in memory!!!

So to use getsizeof, it would be: sys.getsizeof(mylist) + sys.getsizeof(x)

nosklo
That's not the case for immutable types, sys.getsizeof(range(1000)) returns the same size as sys.getsizeof([0] * 1000)
Nadia Alramli
@Nadia Alramli: Exactly my point - both your examples are running `sys.getsizeof` on a list of 1000 items - it doesn't matter which items are, so they'll return the same size.
nosklo
+3  A: 

Have a look at guppy/heapy; I haven't played around with it too much myself, but a few of my co-workers have used it for memory profiling with good results.

The documentation could be better, but this howto does a decent job of explaining the basic concepts.

Pär Wieslander
Thanks Pär, will try it; shows how difficult the problem is.Do any of your co-workers have a short note on saving memory in Python, which would answer e.g. range(1000000) -> province names: list or dict ?
Denis