I am working with a large set (5-20 million) of String keys (average length 10 chars) which I need to store in an in memory data structure that supports the following operation in constant time or near constant time:
// Returns true if the input is present in the container, false otherwise
public boolean contains(String input)
Java's Hashmap is proving to be more than satisfactory as far as throughput is concerned but is taking up a lot of memory. I am looking for a solution that is memory efficient and still supports a throughput that is decent (comparable with or nearly as good as hashing).
I don't care about the insertion/deletion times. In my application, I will be performing only insertions (only at startup time) and will subsequently only be querying the data structure using the contains
method for the life of the application.
I read that the HAT-Trie data structure is closest for my needs. I am wondering if there is a library that has an implementation.
Other suggestions with pointers to implementations welcome.
Thank You.