views:

57

answers:

2

How would you implement a cache class that supports timeouts using the new Concurrent Collections of .Net 4?

The cache class will be typically holding hundreds of thousands of entries. A very large part of the cache may expire simultaneously.

Unlike a typical web cache, which may shrink due to memory pressure, this class should only automatically remove objects if they time out.

+1  A: 

Why do you need to implement a custom caching class? Isn't the default implementation enough? There's a whole new assembly dedicated to caching in .NET 4.0. Example of caching a value for 10 minutes:

var cache = MemoryCache.Default;
cache.Add("key", new object(), DateTime.Now.AddMinutes(10));
Darin Dimitrov
Hi Darin,Thanks for responding.This question was meant as a challenge. Although it is inspired by a real problem, I am not actually about to implement a cache.In fact, my choice of the term 'cache' is unfortunate. What I am interested in is a dictionary with timeouts. As long as there is not timeout, the dictionary must not purge the entry under any circumstances. I modified the question to clarify this.MemoryCache, while otherwise a useful tool, purges objects due to memory pressure. This makes it unsuitable for my hypothetical scenario.
Teleo
@Teleo, thanks for this clarification. You are absolutely correct that `MemoryCache` will purge due to memory pressure.
Darin Dimitrov
A: 

Take a step back and consider how a size-based cache works. An LRU policy will remove the last recently used entry. An entry is considered to be used when it is accessed (read) or mutated (written). The most common algorithm is to have a doubly-linked list cross-cut the hashtable so that a table lookup finds the entry in the list chain in O(1) time. The list maintains the ordering where the head is the least-recently-used and the tail is the most-recently-used, so an accessed entry is moved to the tail. On an eviction, the head entry is the victim and it is unlinked and removed removed from the table.

Now think about expiration. The policies found in practice will reset the timeout based on the last read (time-to-idle) or the last write (time-to-live). This may sound similar to the LRU algorithm described above, as it maintains ordering based on the same access patterns. The only difference is when the cache determines that an eviction is needed: size vs. time. Thus, we can apply the same algorithm and just change how we detect an overflow!

This means that you can implement an LRU algorithm and generalize the evaluation predicate to allow switching between a capacity bound to a time bound. The result is a time and space efficient expirable cache.

Ben Manes
Hi Ben,Thanks for taking the time to answer. I am interested in time-based expiration only, without any risk of losing data due to anything but a timeout. This is basically a Dictionary with timeouts, but a highly concurrent one.
Teleo
@Teleo, I understood that. My point is that the algorithm to build an LRU cache can easily be adapted to use time-based eviction instead. It is simply a matter of changing the evaluation criteria from a size to time constraint. If a concurrent LRU can be implemented, then a concurrent expirable cache can be implemented as well. A concurrent LRU is actually fairly straightforward but non-obvious, but I can provide those details and a Java implementation if you're interested.
Ben Manes
@Ben, I found your `ConcurrentLinkedHashMap` and enjoyed reading it. I see your point of using the same algorithm with some adjustments (like eschewing the maintenance of _maximum weight_). Nevertheless, in its core your code uses Java Concurrent Collections. Their .Net counterparts, at least, are imperfect for the job. For instance, removing an item of ConcurrentDictionary locks one of the lock stripes. 100,000s removes would perform terribly, locking the same Monitor locks again and again and taking a lot of time. (cont...)
Teleo
@Teleo, You can work around the contention by evaluating the freshness of the entry on retrieval and faking a cache miss if it has expired. Optionally this can use a scavenger thread as well. I did this in my ExpirableCache tutorial (see the project's wiki) which I wrote before realizing the LRU characteristics so it uses a O(lg n) priority queue. The approach I've been adopting when writing concurrent algorithms is to work around the lock contention rather than try to remove the locks entirely. This tends to be much simpler and more scalable when the lock-free algorithm is overly complicated.
Ben Manes