ansaurus

Question

Cache Invalidation - Is there a General Solution?

Answer 1

A:

Perhaps cache-oblivious algorithms would be the most general (Or at least, less hardware configuration dependent), since they'll use the fastest cache first and move on from there. Here's a MIT lecture on it: Cache Oblivious Algorithms

CookieOfFortune 2009-07-27 14:50:19

I think that he's not talking about hardware caches - he's talking about his getData() code having a feature that "caches" the data he got from a file into memory.

Alex319 2009-07-27 14:56:41

Answer 2

+1 A:

I'm working on an approach right now based on PostSharp and memoizing functions. I've run it past my mentor, and he agrees that it's a good implementation of caching in a content-agnostic way.

Every function can be marked with an attribute that specifies its expiry period. Each function marked in this way is memoized and the result is stored into the cache, with a hash of the function call and parameters used as the key. I'm using Velocity for the backend, which handles distribution of the cache data.

Chris McCall 2009-07-27 14:54:29

Answer 3

A:

Is there a general solution or method to creating a cache, to know when an entry is stale, so you are guaranteed to always get fresh data?

No, because all data is different. Some data may be "stale" after a minute, some after an hour, and some may be fine for days or months.

Regarding your specific example, the simplest solution is to have a 'cache checking' function for files, which you call from both getData and transformData.

DisgruntledGoat 2009-07-27 14:59:13

Answer 4

+2 A:

If you're going to getData() every time you do the transform, then you've eliminated the entire benefit of the cache.

For your example, it seems like a solution would be for when you generate the transformed data, to also store the filename and last modified time of the file the data was generated from (you already stored this in whatever data structure was returned by getData(), so you just copy that record into the data structure returned by transformData()) and then when you call transformData() again, check the last modified time of the file.

Alex319 2009-07-27 15:00:56

Answer 5

+2 A:

What you are talking about is lifetime dependency chaining, that one thing is dependent on another which can be modified outside of it's control.

If you have an idempotent function from a, b to c where, if a and b are the same then c is the same but the cost of checking b is high then you either:

accept that you sometime operate with out of date information and do not always check b
do your level best to make checking b as fast as possible

You cannot have your cake and eat it...

If you can layer an additional cache based on a over the top then this affects the initial problem not one bit. If you chose 1 then you have whatever freedom you gave yourself and can thus cache more but must remember to consider the validity of the cached value of b. If you chose 2 you must still check b every time but can fall back on the cache for a if b checks out.

If you layer caches you must consider whether you have violated the 'rules' of the system as a result of the combined behaviour.

If you know that a always has validity if b does then you can arrange your cache like so (pseudocode):

private map<b,map<a,c>> cache // 
private func realFunction    // (a,b) -> c

get(a, b) 
{
    c result;
    map<a,c> endCache;
    if (cache[b] expired or not present)
    {
        remove all b -> * entries in cache;   
        endCache = new map<a,c>();      
        add to cache b -> endCache;
    }
    else
    {
        endCache = cache[b];     
    }
    if (endCache[a] not present)     // important line
        result = realFunction(a,b); 
        endCache[a] = result;
    else   
        result = endCache[a];
   return result;
}

Obviously successive layering (say x) is trivial so long as, at each stage the validity of the newly added input matches the a:b relationship for x:b and x:a.

However it is quite possible that you could get three inputs whose validity was entirely independent (or was cyclic), so no layering would be possible. This would mean the line marked // important would have to change to

if (endCache[a] expired or not present)

ShuggyCoUk 2009-07-27 15:07:06

ansaurus

tags:

views:

answers:

Cache Invalidation - Is there a General Solution?

related questions