views:

36

answers:

3

I have billions of records (keys/values) that I want to compactly store in memory, and the only operation I need to support is looking up a value by its key. Keys and values are both small strings. The most important thing is how compressed the data structure is; it should use the internal structure of the keys in a deeper way than a simple associative array. For example, mapping the keys "apple", "apply", and "apron" to the values "1", "2", and "3" should somehow be compressed. What's the data structure I'm looking for?

+2  A: 

It sounds like you want a trie - it does the kind of "compression" that you describe, by storing each prefix only once.

I assume you have enough memory to store "billions" of keys, and of course, you need to be on a 64-bit system to be able to even address so many items in the first place.

casablanca
+2  A: 

You might try a Trie. It forms a tree structure out of the key strings themselves. There wouldn't be empty locations (as in a hash map).

Brian Clements
A: 

Even if the data you are handling are small strings, are you really sure that you need so much data in memory? This could easily hit gigabytes of memory, and most data will probably not be queried so frequently.

A finely tuned database may be enough for your needs.

mdrg