I have a set of 5 million strings. These are currently stored in a single column MySQL table. My application has to perform lookups and check if a given string is in the set. This can of course be done with a HashSet (in Java). But instead of building a custom solution, I was wondering if there are any existing, widely used, proven solutions that do this? It seems like a common scenario. The solution should be scalable (the set might increase beyond 5 million), have failover (so probably distributed) and perform well under a huge number of requests. Any suggestions?
Update: My app can also query to check if a given set of strings is present in the global (the 5 million one) set.