Hi all,
I need to optimize the RAM usage of my application.
PLEASE spare me the lectures telling me I shouldn't care about memory when coding Python. I have a memory problem because I use very large default-dictionaries (yes, I also want to be fast). My current memory consumption is 350MB and growing. I already cannot use shared hosting and if my Apache opens more processes the memory doubles and triples... and it is expensive.
I have done extensive profiling and I know exactly where my problems are.
I have several large (>100K entries) dictionaries with Unicode keys. A dictionary starts at 140 bytes and grows fast, but the bigger problem is the keys. Python optimizes strings in memory (or so I've read) so that lookups can be ID comparisons ('interning' them). Not sure this is also true for unicode strings (I was not able to 'intern' them).
The objects stored in the dictionary are lists of tuples (an_object, an int, an int).
my_big_dict[some_unicode_string].append((my_object, an_int, another_int))
I already found that it is worth while to split to several dictionaries because the tuples take a lot of space...
I found that I could save RAM by hashing the strings before using them as keys!
But then, sadly, I ran into birthday collisions on my 32 bit system. (side question: is there a 64-bit key dictionary I can use on a 32-bit system?)
Python 2.6.5 on both Linux(production) and Windows. Any tips on optimizing memory usage of dictionaries / lists / tuples? I even thought of using C - I don't care if this very small piece of code is ugly. It is just a singular location.
Thanks in advance!