views:

100

answers:

5

Hi All,

I'm looking for a persistent hash structure in java, a simple key-value store, where key is a unique string and value is an int. The value of a key is to be incremented each time an existing key is added to the store.

I need this to be quite large - possibly 500m - 1bn keys. I've been evaluating tokyo-cabinet http://fallabs.com/tokyocabinet/javadoc/ but not sure how well it will scale - insert times seem to be getting longer as the hash grows.

Any ideas on what might be appropriate?

Thanks

Edit: In order to reduce disk I/O I'm going to be caching data in an in-memory HashMap, then updating the persistent hash in one go when the cache grows to a certain size.

Edit2: One of the reasons for the persistence is that I have limited RAM, 4GB, so I can't fit a big struture into memory.

+2  A: 

Use a database not a hash. Even for a database 500M rows is getting quite large. How many updates are you expecting per second?

Tony Ennis
Would a NoSQL db be appropriate - MongoDB for example? These are essentially a key-value store right?
Richard
A: 

So, if I understand correctly, Redis might be an option. You can issue INCR [key] commands to atomically increment the value associated with that key. If the key does not exist, its set to zero and then incremented (resulting in one). According to the docs, INCR is a constant-time operation. Speed is a primary design goal for Redis.

Redis is able to persist itself to file, and you can control the parameters on how that happens.

romacafe
Sounds like Redis might have to fit entirely in memory. From the notes "In order to be very fast but at the same time persistent the whole dataset is taken in memory". I am limited by 4GB RAM.
Richard
It does have virtual memory capabilities, http://code.google.com/p/redis/wiki/VirtualMemoryUserGuide. Its also doesn't have to run locally on the same host as your JVM. Of course, it depends on how much freedom your organization gives you in terms of what you can install in your production environment...
romacafe
Well, there is this caveat: "WARNING: Because keys can't be swapped out, Redis will not be able to honour the vm-max-memory setting if the keys alone are using more space than the limit." I guess that rules out Redis for you, unless you just get a really big box to run it on...
romacafe
A: 

I think Memcached is good option for your case along with a suitable database in the backend.

Emil
+1  A: 

Have you checked out Berkeley BD Java Edition? They have a Collections-compatible API (see also the Javadoc for StoredMap).

larsmans
+2  A: 

I thing Megamap is what you are looking for: http://megamap.sourceforge.net/. Here is a short description of Megamap from its homepage:

MegaMap is a Java implementation of a map (or hashtable) that can store an unbounded amount of data, limited only by the amount of disk space available. Objects stored in the map are persisted to disk. Good performance is achieved by an in-memory cache. The MegaMap can, for all practical reasons, be thought of as a map implementation with unlimited storage space.

Skarab
Looks interesting, I'll check it out, thanks
Richard
Thanks again for the suggestion - but looks unmaintained - hasn't been updated since 2005 :(
Richard
I would take a look on ehcache or other terracota libraries, probably one of them can help you. MegaMap was developed on the top of ehcache, so it is a good direction for searching. Maybe ehcache can do it out-of-the-box.
Skarab
This link seems to be a good start - http://ehcache.org/documentation/offheap_store.html#Configuring_caches_to_overflow_to_off-heap.
Skarab
@Skarab - thanks
Richard