tags:

views:

239

answers:

4
+5  Q: 

Theory of caching

Is there a unified theory of caching? That is, collections of theorems and algorithms for constructing caches and/or optimizing for them?

The question is deliberately broad, because the results I am looking for are also broad. Formulas for maximum achievable speedup, metrics for caching algorithms, stuff like that. A university-level textbook would probably be ideal.

+1  A: 

Have you checked the wiki entry Cache algorithms as a starting point?

Also, Intro to Caching,Caching algorithms and caching frameworks part 1

Mitch Wheat
+1  A: 

If you can assume that a cache hit is much faster than a cache miss, you will find that overtime, even if you have only cache misses, using a cache will still be as fast or faster than not using a cache.

See below for the math:

Number of hits = NumRequests - #CacheMisses

AverageTime = ((NumRequests-#CacheMisses) * TimePerHit + #CacheMisses * TimePerMiss)/NumRequests

If we then assume that NumRequests is infinity (this is a limit problem, don't fear the calculus), we can see this:

AverageTime = Infinity*TimePerHit/Infinity - #CacheMisses*TimePerHit/Infinity + #CacheMisses*TimePerMiss/Infinity

Both terms with the #CacheMisses goes to zero, but the whole equation resolves to:

AverageTime = TimePerHit

Granted this is for when the number of requests is infinity, but you can see how this will easily speed up your system by using a cache.

samoz
Your calculation looks very good. Unfortunately it implies the assumption that the number of cache misses is constant, which seems highly unlikely.The correct number would be:HitProbability * TimePerHit + (1 - HitProbability) * TimePerMiss
Jørgen Fogh
I know the math is a little iffy; this is a proof I had to learn for a class I took last fall and I just tried to recall it from there.Addressing your point though, since I have CacheHits vs CacheMisses, this is a ratio, so wouldnt this be the same thing as using a hit probability?
samoz
Not quite. Assuming that there is a constant non-zero probability of cache misses, there will be infinitely many misses if infinite trials are performed.Your formula would be correct if the cache is big enough to hold every piece of data which will ever be requested. Then there is a constant upper bound to the number of misses.
Jørgen Fogh
A: 
Chris Harris
+1  A: 

I talked to one of the professors at my school, who pointed me towards online algorithms, which seems to be the topic I am looking for.

There is a great deal of overlap between caching algorithms and page replacement algorithms. I will probably edit the WikiPedia pages for these topics to clarify the connection, once I have learned more about the subject.

Jørgen Fogh