In one of my current side projects, I am scanning through some text looking at the frequency of word triplets. In my first go at it, I used the default dictionary three levels deep. In other words, topDictionary[word1][word2][word3] returns the number of times these words appear in the text, topdictionary[word1][word2] returns a dictionary with all the words that appeared following words 1 and 2, etc.
This functions correctly, but it is very memory intensive. In my initial tests it used something like 20 times the memory of just storing the triplets in a text file, which seems like an overly large amount of memory overhead.
My suspicion is that many of these dictionaries are being created with many more slots than are actually being used, so I want to replace the dictionaries with something else that is more memory efficient when used in this manner. I would strongly prefer a solution that allows key lookups along the lines of the dictionaries.
From what I know of data structures, a balanced binary search tree using something like red-black or AVL would probably be ideal, but I would really prefer not to implement them myself. If possible, I'd prefer to stick with standard python libraries, but I'm definitely open to other alternatives if they would work best.
So, does anyone have any suggestions for me?
Edited to add:
Thanks for the responses so far. A few of the answers so far have suggested using tuples, which didn't really do much for me when I condensed the first two words into a tuple. I am hesitant to use all three as a key since I want it to be easy to look up all third words given the first two. (ie I want something like the result of topDict[word1,word2].keys() ).
The current dataset I am playing around with is the most recent version of Wikipedia For Schools. The results of parsing the first thousand pages, for example, is something like 11MB for a text file where each line is the three words and the count all tab separated. Storing the text in the dictionary format I am now using takes around 185MB. I know that there will be some additional overhead for pointers and whatnot, but the difference seems excessive.
Once again, thank you all for the responses so far.