views:

96

answers:

1

The program I've written stores a large amount of data in dictionaries. Specifically, I'm creating 1588 instances of a class, each of which contains 15 dictionaries with 1500 float to float mappings. This process has been using up the 2GB of memory on my laptop pretty quickly (I start writing to swap at about the 1000th instance of the class).

My question is, which of the following is using up my memory?

  • 34 million some pairs of floats?
  • The overhead of 22,500 dictionaries?
  • the overhead of 1500 classes?

To me it seems like the memory hog should be the huge number of floating point numbers that I'm holding in memory. However, If what I've read so far is correct, each of my floating point numbers take up 16 bytes. Since I have 34 million pairs, this should be about 108 million bytes, which should be just over a gigabyte.

Is there something I'm not taking into consideration here?

+7  A: 

The floats do take up 16 bytes apiece, and a dict with 1500 entries about 100k:

>> sys.getsizeof(1.0)
16
>>> d = dict.fromkeys((float(i) for i in range(1500)), 2.0)
>>> sys.getsizeof(d)
98444

so the 22,500 dicts take over 2GB all by themselves, the 68 million floats another GB or so. Not sure how you compute 68 million times 16 equal only 100M -- you may have dropped a zero somewhere.

The class itself takes up a negligible amount, and 1500 instances thereof (net of the objects they refer to of course, just as getsizeof gives us such net amounts for the dicts) not much more than a smallish dict each, so, that's hardly the problem. I.e.:

>>> sys.getsizeof(Sic)
452
>>> sys.getsizeof(Sic())
32
>>> sys.getsizeof(Sic().__dict__)
524

452 for the class, (524 + 32) * 1550 = 862K for all the instances, as you see that's not the worry when you have gigabytes each in dicts and floats.

Alex Martelli
If I plug in range(1000) or range(1250) instead of 1500 in your code above, I get a size of 24712 bytes. This seems to be true until the range(1365) at which point there is a discreet jump in the size. Is this true to how python allocates memory? If so, why is this the case?
Wilduck
I though a Python float was implemented using a C double, which would generally be 64-bit (8 byte). And the double type takes 8 bytes in the `struct` module. I can't argue with the result of the `getsizeof` though...
Scott Griffiths
@Scott, every Python object has some overhead (reference counter, pointer to the type) in addition to its "payload", and all allocations are rounded up to multiple of 16 bytes to play nicely along with system malloc. To keep floats compactly, you can use an `array.array` of them (but that's list-like, not dict-like).
Alex Martelli
@Wilduck, dicts are hash tables so must always be overallocated to get good performance. As containers grow, the overallocation also grows (to amortize realloc time overhead and make it amortized O(1)).
Alex Martelli
@Alex I guess I would have expected a growth in the allocation/overallocation, but I was surprised by the increase of a factor of 4 with the jump from 1365 to 1366 entries. If I could manage to get all of my dictionaries down to 1300 entries, would I decrease the memory used by my dictionaries to 500MB instead of 2GB?
Wilduck
Alex Martelli
It looks like the actual memory footprint is about halved when I can keep the number of entries per dictionary to below the threshold. So, the findings agree with the notes but not getsizeof. Interesting. Also helpful.
Wilduck
@wilduck, tx for the report -- makes me wonder if there may be a bug in getsizeof...!
Alex Martelli
It seems that all of the size increases that getsizeof reports for dicts are four-fold increases. There's definitely an error somewhere, either in the documentation or the module.
Wilduck