I need a disk backed Map structure to use in a Java app. It must have the following criteria:
- Capable of storing millions of records (even billions)
- Fast lookup - the majority of operations on the Map will simply to see if a key already exists. This, and 1 above are the most important criteria. There should be an effective in memory caching mechanism for frequently used keys.
- Persistent, but does not need to be transactional, can live with some failure. i.e. happy to synch with disk periodically, and does not need to be transactional.
- Capable of storing simple primitive types - but I don't need to store serialised objects.
- It does not need to be distributed, i.e. will run all on one machine.
- Simple to set up & free to use.
- No sql type queries required
Records keys will be strings or longs. As described above reads will be much more frequent than writes, and the majority of reads will simply be to check if a key exists (i.e. will not need to read the keys associated data). Each record will be updated once only and records are not deleted.
I'm currently using Bdb JE but am looking for other options and am interested in hearing what others either recommend or advise against using.
Update
I have since improved the query performance on the existing BDB setup by reducing the dependency on secondary keys. Some of our queries required a join on two secondary keys. By reducing the number of secondary keys joined on (e.g by combining them into a composite key) we have removed a level of indirection in the lookup which speeds things up nicely.