views:

116

answers:

4

I have a python program that is going to eat a lot of memory, primarily in a dict. This dict will be responsible for assigning a unique integer value to a very large set of keys. As I am working with large matrices, I need a key-to-index correspondence that can also be recovered from (i.e., once matrix computations are complete, I need to map the values back to the original keys).

I believe this amount will eventually surpass available memory. I am wondering how this will be handled with regards to swap space. Perhaps there is a better data structure for this purpose.

+5  A: 

You need a database, if the data will exceed memory. The indexing of dictionaries isn't designed for good performance when a dictionary is bigger than memory.

Andrew McGregor
+1  A: 

Swap space is a kernel feature and transparant to the user (python).

If you do have a huge dict and don't need all the data at once, you could look at redis which might do what you want. Or maybe not :)

extraneon
A: 

It will just end up in swap trashing, because a hash table has very much randomized memory access patterns.

If you know that the map exceeds the size of the physical memory, you could consider using a data structure on the disk in the first place. This especially if you don't need the data structure during the computation. When the hash table triggers swapping, it creates problems also outside the hash table itself.

antti.huima
"thrashing" you mean
Wahnfrieden
A: 

As far as I can remember, when a dict is expanded it just relies on C's malloc. The program will keep running as long as malloc keeps succeeding. Most OS's will keep malloc working as long as there is enough memory, and then as long as there are pages that can be swapped in. In either case Python will throw a MemoryError exception when malloc fails, as per the documentation. As far as the data structure goes, dict is going to be very efficient space-wise. The only way to really do better is to use an analytical function to map the values back and forth.

zdav