views:

167

answers:

7

Maybe this is slightly academic, but if I implement a cache for speeding up an application, how should I best handle cache misses? (In my case, the language would be Java, but maybe the answer can be more general.)

Throw an exception:

ResultType res;
try {
    res = Cache.resLookup(someKey);
} catch (NotFoundException e) {
    res = Cache.resInsert(someKey, SlowDataSource.resLookup(someKey));
}

Ask before fetch:

ResultType res;
if (Cache.contains(someKey) {
    res = Cache.resLookup(someKey);
} else {
    res = Cache.resInsert(someKey, SlowDataSource.resLookup(someKey));
}

Return null:

ResultType res;
res = Cache.resLookup(someKey);
if (null == res) {
    res = Cache.resInsert(someKey, SlowDataSource.resLookup(someKey));
}

Throwing an Exception seems wrong, after all, this isn't an error. Letting the Cache do a look up for contains() and then again to retrieve the data seems wasteful, especially as this would occur every time. And checking for null of course requires that null can never be a valid result...

+1  A: 

I would tend towards a checked exception, since you can't inadvertently ignore it and accidently return a null. I'm assuming (unlike most people here) that a cache miss is an unusual scenario.

I also assume you're talking about the cache internal implementation. From the client perspective this should be invisible.

Brian Agnew
Yes, this would be internal to the cache, the code doesn't make that clear, sorry.
Hanno Fietz
-1. The first time a value is ever used it will not be in the cache, thus a cache miss is NORMAL not exceptional.
KitsuneYMG
-1 for checked exceptions
cletus
@kts - it's not exceptional the *first* time, no
Brian Agnew
Using exceptions for program flow is a bad design/practice
matt b
@cletus - my point is that it is checked so you have to check the value returned, which can be null. Note that this isn't exposed to the client and the client doesn't have to handle it. As such, I think it's a reasonable use of an exception. Is it flow control ? Well, we could argue endlessly about that. But I wouldn't expect it to behave in that fashion often given that it's a cache, and if designed/managed properly, hits would occur much more frequently
Brian Agnew
@matt - how do you use exceptions if *not* for flow control. e.g. if you ask a user to enter as number, and then you use a NumberFormat object, you'll get an exception if the user hasn't entered a proper number. Yes, it's invalid input. Yes, it's exceptional. Yes, by handling this and reporting to the user you've introduced flow control
Brian Agnew
@Brian. Since a cache is never guaranteed to contain a value, and in memory critical times never will, a cache miss is never exceptional. As to hits and misses: In some use cases (any depending on end-user inputs) most cached values are never used more than once, meaning that misses > hits. The programmer felt that the memory/time tradeoff was worth it in the case the same value needed to be fetched again.
KitsuneYMG
+5  A: 

The first is excessive I think and not a good use for exceptions. Do you have an expectation that there will be a cache hit? A cache miss is a fairly normal occurrence I would think and thus an exception becomes simply flow control. Not good imho.

The second is a race condition. There is a time delay between checking on the existence of the cache entry and querying it. That could lead to all sorts of trouble.

Returning null is probably appropriate in the general sense but that comes with some qualifications.

Firstly, what type of cache is it? Ideally you'd be talking to a read-through cache in which case if it's not in the cache it'll simply get it from the source, which is not the style of code you've written there.

Secondly, the get then insert is another race condition. Look at the interface for ConcurrentHashMap for a good general way of dealing with this kind of thing. Most notably, the putIfAbsent() call is an atomic operation that does the equivalent of you're third call.

cletus
Good point about the race condition, thanks. But isn't the third, too?
Hanno Fietz
yes, the third one also has race conditions - the condition is that after reading, you might overwrite an entry placed in the cache just after you've read it (as null), but before you were able to insert.
Chii
There properly is a race condition in the third example, but I don't see how that could harm anybody, since the same result should be found and there is no way in which there wouldn't be a correct result in the cache.
tomjen
+4  A: 

The last option (if null == result) is best, in my opinion.

A cache miss is not an exceptional condition, and should be handled in the normal code flow.

And if the action of checking if something exists in the cache may be a somewhat expensive operation (e.g. the network overhead of a memcached call), it shouldn't be a separate operation. Also, the value of contains() may change before you actually retrieve the item, if the cache is shared among threads.

Avi
I agree, throwing an exception when the cache is missed is a BAD idea since this will happen the first time you use a new value.
KitsuneYMG
A: 

Since you are speeding up an existing api via caching the invisible one (2nd one) would apply. I say this as I'm assuming that the api exists and you're not prematurely optimising.

Preet Sangha
A: 

(3) looks like the easiest to read and most performant (even though the actual difference in performance between these alternatives is problably neglible).

With (2) you have to make two lookups in the cache (first "contains" and then "resLookup").

With (1) you create an additional object, the code gets more complicated and harder to read, and a cache miss isn't a exceptional case.

D. Wroblewski
What's the additional object in case 1 ?
Brian Agnew
The NotFoundException
D. Wroblewski
A: 

From a 'clear code' perspective

  1. A cache-miss is not an exception but a normal case. So I'd not use an Exception here.
  2. null-values should be avoided when ever possible. Choose something meaningful instead
  3. This leaves option 'ask before fetch' or a variation of your third option, where you don't return 'null' but a special ResultType object that signals a cache-miss

Example:

public class ResultType {

  public final static ResultType CACHE_MISS = new ResultType();

  // ... rest of the class implementation

}

and later on

ResultType res;
res = Cache.resLookup(someKey);
if (ResultType.CACHE_MISS == res) {
    res = Cache.resInsert(someKey, SlowDataSource.resLookup(someKey));
}

Advantage over 'null' solution: the reader now immediatly knows that is 'if' handles a cache miss.

Andreas_D
+1  A: 

What about a 4th option? You could use a holder for the return value, and have the lookup return a boolean for success:

ResultHolder result = new ResultHolder();
if(!cache.resLookup(someKey, result))
{
    // get from slower source and insert to cache
}

if(result.value == null)
{
    // special case if you wanted null as a valid value
}

This is basically your third option, keeping a single call, but if you wanted to have null as a value you could.

CodeGoat
+1 for an example that allows null values.
Hanno Fietz