views:

162

answers:

2

hi there,

I bumped into a case where I need a big (=huge) python dictionary, which turned to be quite memory-consuming. However, since all of the values are of a single type (long) - as well as the keys, I figured I can use python (or numpy, doesn't really matter) array for the values ; and wrap the needed interface (in: x ; out: d[x]) with an object which actually uses these arrays for the keys and values storage.

I can use a index-conversion object (input --> index, of 1..n, where n is the different-values counter), and return array[index]. I can elaborate on some techniques of how to implement such an indexing-methods with reasonable memory requirement, it works and even pretty good. However, I wonder if there is such a data-structure-object already exists (in python, or wrapped to python from C/++), in any package (I checked collections, and some Google searches).

Any comment will be welcome, thanks.

+2  A: 

This kind of task is a typical database-type access (large volume of data in columns of a given type). You would create a simple table with indexed keys, for fast access. I don't have experience with it, but you might want to check out the standard sqlite3 module.

If your keys do not change over time, you could alternatively put all your data in two Python memory-optimized arrays (standard array module); one array contains the sorted keys, and the other one the corresponding values. You could then find key indexes through the optimized bisect.bisect function.

EOL
+1 for suggesting a DB and another Python solution.
Xavier Ho
A: 

You might try using std::map. Boost.Python provides a Python wrapping for std::map out-of-the-box.

taleinat