views:

96

answers:

5

I have a java web app that makes back-end use of a third-party web service. Calling the web service creates latency, which is important to avoid whenever possible. Also, my app is only allowed to make a certain number of web service calls per day, so it's best not to make a web service call unless absolutely needed.

My current solution is to cache the web service results in Memcached, and this works well. Basically, we are utilizing RAM to cache the web service results.

However, we would like to take this to the next level. We also have disk space that we would like to use as a disk cache for caching web service results. I'd like a system where first we check the RAM cache (which could be Memcached, but doesn't have to be). When a RAM cache miss happens, we would then fall back to checking the disk cache. And when a disk cache miss happens, we would then fall back to calling the web service. Whenever we retrieve new web service results, we would then update both the RAM cache and the disk cache.

One possibility would be to use a SQL database as the piece of the system that uses the disk for storage. But this seems less than ideal. Databases tend to need a lot of babysitting. They often involve files (either the db itself or the transaction log) that grow without bound, so you need to manage what happens when these growing files start to cause the filesystem to run out of space.

What I want instead for the disk-based part of the system is something where I can tell it how much disk space to use, and it will guarantee that it will never use more than that. And when it runs out of space, it will automatically start throwing away the least recently used key-value pairs. I definitely don't need ACID, so there should be no transaction logs.

So I am looking for either: 1) a disk-based key-value storage system that can act as the "failover" when Memcached has a cache miss OR 2) a single system that would replace Memcached and provide both the RAM cache and the disk cache.

Other important qualities that I want: 1) Like Memcached, I want a caching system that requires no babysitting. 2) Like Memcached, I want the cache to shard across several servers, with each object living on exactly one server. 3) Like Memcached, I want something that's fairly easy to plug in and use. I don't want to have to write a ton of code to get this working.

Other systems that I've already looked at: 1) I believe Redis doesn't fit the bill here, since its disk cache is just a mirror of what's in RAM. I want the RAM cache to be a small subset of the disk cache. 2) EhCache has a "persistent disk store which stores data between VM restarts‎", but that's not very similar to what I've described above.

Apache JCS (Java Caching System) looks like it might be a good fit, so I'd love to hear opinions about it from those who have used it.

A: 

MemcacheDB might be the answer you're looking for. Reddit uses it for its "permacache".

Ben Hughes
Reddit actually dropped memcachedb for Cassandra: http://blog.reddit.com/2010/03/she-who-entangles-men.html
Frank Farmer
+1  A: 

I used ehcache for RAM/DISK based caching and this worked fine. The exact configuration to determine how many object to keep in memory and how many to keep on disk can be done outside the code without any code changes. There is not a lot to say, it is a cache and it works just fine.

I used it to store wafermaps to avoid to get them from remote database. I sized the diskcache is such a way as to be able to keep several months of production near the appserver resulting is significant time savings, especially when some urgent rework must be done.

Peter Tillemans
Thanks. This may be the solution I'm looking for. I see now that you can configure a DiskStore and specify the maxElementsOnDisk parameter.
Mike W
A: 

You need Project Voldemort

zengr
A: 

Cassandra. There are over a dozen NoSQL solutions out there that store to both memory and disk. Few if any of them are as battle-tested as Cassandra. Used by facebook, reddit, and digg in production, to name a few.

Frank Farmer
A: 

Use Redis, it supports all memcache operations and will save data to disk and is damn fast. Reads are slow in cassandra so I wont go for that.

Kalpesh Patel