views:

1348

answers:

7

I am looking for a simple in-memory (and in-process) cache for short-term caching of query data (but short-term meaning beyond request/response, i.e. session boundary). EhCache would probably work, but it looks as if it might not offer one thing that I need: limits not on number of objects cached, but (approximate) limit on amount of memory consumed by cached data.

I understand that it is hard to figure out exact memory usage for given object without serialization (which I want to avoid in general case due to its slowness defeats the purpose for my uses), and I am fine with having to provide size estimate myself.

So: is there a simple open source java cache that allows for defining "weight" of cached objects, to limit amount of things cached?

A: 

It's not just hard to measure - it's hard to define.

Suppose two cache entries refer to the same string - do they both count the size of that string, despite the fact that removing either of them from the cache wouldn't make the string eligible for garbage collection? Do neither of them count the size, despite the fact that if both of them are removed from the cache the string may then be eligible for collection? What about if another object not in the cache has a reference to that string?

If you can accurately describe the size you're interested in it may be possible to ascertain it programmatically - but I suspect you'll find it's hard even to decide exactly what you want.

Jon Skeet
Hi Jon,Your comment makes sense overall, but there's a meaningful way to compute the size of the cache (retained size). See my answer below.
kohlerm
I am not interested in generic solution -- for my use case, yes, I can very easily determine approximate size.I agree that in generic case it is of course impossible. That's fine, I don't need to solve that problem. :)
StaxMan
Okay, so you want weighting by size. Do you also want weighting by last time used as well? How do you want those to balance? Without the last time element, you could use a PriorityQueue alongside a HashMap, to work out what to evict. Keep a total as well so you know when to evict.
Jon Skeet
A: 

As well as guessing the memory usage of the object, for a reasonable algorithm you would also need to guess the cost of recreating it. A reasonable guess would be the cost of recreation is roughly proportional to memory size. So the factors cancel each other out and you need neither. A simple algorithm is probably going to work out better.

Tom Hawtin - tackline
A: 

If you cannot make any estimations - write a cache eviction policy which flushes based on the JVM heap size (polled from System) or triggered by a finalize()-call from an orphaned object (on GC).

disown
Also see https://jira.jboss.org/jira/browse/JBCACHE-11
disown
A: 

The thing that does this job is java.lang.ref.SoftReference . Typically, you extend the SoftReference class so that the subclass contains the key.

paulmurray
+3  A: 

I agree with Paul that this is often solved by using a soft reference cache, though it may evict entries earlier than you prefer. A usually acceptable solution is to use a normal cache that evicts to the soft cache, and recovers entries on a miss if possible. This victim caching approach works pretty well, giving you a lower bar but extra benefit if free memory is available.

The memory size can be determined by enabling the Java agent, and usage is pretty simple when using the SizeOf utility (http://sourceforge.net/projects/sizeof). I've only used this for debugging purposes, and I'd recommend benchmarking the overhead before adopting it for normal usage.

In my caching library, I am planning on adding the ability to plug in a evaluator once the core algorithm is implemented. This way you could store a collection as the value, but bound the cache by the sum of all collection sizes. I have seen unbounded collections as values in caches cause OutOfMemoryExceptions, so having control is quite handy.

If you really need this, and I'd advise not to, we could enhance my current implementation to support this. You can email me, ben.manes-at-gmail.com.

Ben Manes
This actually sounds like something that could work once you add it.Like I said, I don't need exact usage; I can easily figure out approximate costs for byte[]s and such. I just need to have a cache that can make use of just weights.
StaxMan
Please email me so we can talk about this more. I believe you could decorate a LinkedHashMap for this without much difficulty. Doing so would resolve a race concern I'd have to resolve since I am not using a giant lock to guard the entire structure. So I'd prefer pushing this feature off for now...
Ben Manes
A: 

How about using a simple LinkedHashMap with LRU algorithm enabled and put all data with a SoftReference in it... such as cache.out(key, new SoftReference(value)) ??

This would limit your cache to the amount of available memory but not kill the rest of your programm, because Java removes the soft references when there is a memory demand... not all.. the oldest first... usually. If you add a reference queue to your implementation, you can also remove the stall entries (only key, no value) from the map.

This would free you from calculating the size of the entries and keeping track of the sum.

ReneS
It's not really a good idea to use Softrefernces for cache implementations. At least not in an app server. The reason is that you cannot really control very well how long they live.
kohlerm
Well, it depends on your needs. Usually a cache needs only limited predictability. A LRU algorithm backed by an emergency value in case we are low in memory is not too bad. You can improve that with a two level cache... hard references first, soft after some time.
ReneS
the point is that the live time of Softreferences only depends on the free memory of the app server. You cannot control it independently per application.
kohlerm
Why should I control it? This is the nice part, that I can use the memory I have for caching (hence the word caching) and if I do not have the memory... hey, I have to recalc things.
ReneS
Yes, If you have a single user desktop application that might work. But on an app server you could end up with several applications competing for the available memory and you cannot control how much each gets.
kohlerm
You control the max with the Xmx setting. On an appserver, there are always many applications which compete. The trick is to find out, whether you want to trade memory for speed or vise versa.
ReneS
A: 

It's possible to define a meaningful measure for the memory usage of a cache. You could compute the : "retained size". Unfortunately computing the retained size is roughly as costly as a full GC, and it's therefore probably not an option. In certain JVM languages (clojure?) you could theoretically make sure that no objects in the cache would be referenced from outside objects and then you could monitor the real size of the cache.

kohlerm
However, that measure doesn't take account of the fact that pushing either one of two entries out of the cache might free almost no space, but pushing *both* out might free a very large amount.
Jon Skeet
Hi Jon,You just need to compute the "retained set" for the whole cache. You simulate that the cache "root" object (assume for now it's one), could be removed from memory, you run a simulated full GC and then you count which objects would be reclaimed.
kohlerm