views:

60

answers:

2

Hi guys,

Here I need to cache some entites, for example, a Page Tree in a content management system (CMS). The system allows developers to write plugins, in which they can access the cached page tree. Is it good or bad to make the cached page tree mutable (i.e., there are setters for the tree node objects, and/or we expose the Add, Remove method in the ChildPages collection. So the client code can set properties of the page tree nodes, and add/remove tree nodes freely)?

Here's my opinions:

(1) If the page tree is immutable, the plugin developers has no way to modify the tree unexpected. That way we can avoid some subtle bugs.

(2) But sometimes we need to change the name of a page. If the page tree is immutable, we should invoke some method like "Refresh()" to refresh the cache. This will cause a database hit(so totally two database hits, but we should have avoided 1 of the 2 database hit). In this case, if the page tree is mutable, we can directly change the name in the page tree to make the tree up to date (so only 1 database hit is needed).

What do you think about it? And what will you do if you encounter such a situation?

Thanks in advance! :)

UPDATE: The page tree is something like:

public class PageCacheItem {
    public string Name { get; set; }
    public string PageTitle { get; set; }
    public PageCacheItemCollection Children { get; private set; }
}

My problem here is not about the hashcode, because the PageCacheItem won't be put on a hashset or dictionary as keys.

My prolbem is:

If the PageCacheItem (the tree node) is mutable, that is, there are setters for properties(e.g., has setter for Name, PageTitle property). If some plugin authors change the properties of the PageCacheItem by mistake, the system will be in a incorrect state (that cached data is not consistent with the data in the database), and this bug is hard to debug, because it's caused by some plugin, not the system itself.

But if the PageCacheItem is readonly, it might be hard to implement efficient "cache refresh" functionality, because there are no setters for the properties, we can't simply update the properties by setting them to the latest values.

UPDATE2

Thanks guys. But I have one thing to note, that is, I'm not going to develop a generic caching framework, but develop some APIs on top of an exsiting caching framework. So my APIs is a middle layer between the underlying caching framework and the plugin authors. The plugin author doesn't need to know anything about the underlying caching framework. He only need to know this page tree is retrieved from cache. And he gets strongly-typed PageCacheItem APIs to use, not the weak-typed "object" retrieved from the underlying caching framework.

So my questions is about designing APIs for plugin authors, that is, is it good or bad to make the API class PageCacheItem mutable (here mutable == properties can be set outside the PageCacheItem class)?

+3  A: 

Look at it this way, if the entry is mutable, then it is likely that the hashcode will change when the object is mutated.

Depending on the dictionary implementation of the cache, it could either:

  • be 'lost'
  • in worst case the entire cache will need to be rehashed

There may be valid reasons why you want 'mutable hashcodes' but I cannot see a justification here. (I have only ever needed to do this once in the last 9 years).

It would be a lot easier just to remove and replace the entry you wish to be 'mutated'.

leppie
Thanks leppie. And sorry for my poor descritions. Here my problem is not about the hashcode, please see the "Update" for detail, thanks :)
Dylan Lin
@Dylan Lin: Any worthwhile cache will use a hashtable of sorts internally. Having anything worse than O(1) lookups would be extremely inefficient.
leppie
I'm assuming that @Dylan Lin means that the values will be mutable, rather than the key data, which hence does mean that the hashcode is not the problem, as it remains immutable either way.
Jon Hanna
@Jon Hanna, you're right@leppie I'm not going to develop a generic caching framework. What I want is to expose the full cached page tree (already retrieved from the caching framework) to the developers, so they can use it easily. No need to concern about the underlying cache API too much.
Dylan Lin
+1  A: 

First, I assume you mean the cached values may or may not be mutable, rather than the identifier it is identified by. If you mean the identifier too, then I would be quite emphatic about being immutable in this regard (emphatic enough to have my post flagged for obscene language).

As for mutable values, there is no one right answer here. You've hit on the primary pro and con either way, and there are multiple variants within each of the two options you describe. Cache invalidation is in general a notoriously difficult problem (as in the well known quote from Phil Karlton, "There are only two hard problems in Computer Science: cache invalidation and naming things."*)

Some things to consider:

  1. How often will changes be made. If changes are rare, refreshes become easy - dump the existing cache and let it rebuild.
  2. Will the CMS be on multiple servers, or could it in the future, as this means that any invalidation information has to be shared.
  3. How bad is stale data, and how soon is it bad (could you happily server out of date values for the next hour or so, or would this conflict disastrously with fresh values).
  4. Does a revalidation approach make sense for you, where after a certain time a cached value is checked to be sure it is still valid, and the time-to-next-check is updated (alternatively, periodically dump old values and let them be retrieved from the fresh source again).
  5. How easy is detecting staleness in the first place? If its hard this can rule out some approaches.
  6. How much does the cache actually save. Could you just get rid of it?

I haven't mentioned threading issues, because the threading issues are difficult with any sort of cache unless you're single-threaded (and if its a CMS I'm guessing it's web, and hence inherently multi-threaded). One thing I'll will say on the matter is that it's generally the case that a cache failure isn't critical (by definition, cache failure has a fallback - get the fresh value) for this reason it can be fruitful to take an approach where rather than blocking indefinitely on the monitor (which is what lock does internally) you use Montior.TryEnter with a timeout, and have the cache operation fail if the timeout is hit. Using a ReaderWriterLockSlim and allowing a slightly longer timeout for writing can be a good approach. This way if you get a point of heavy lock contention then the cache will stop working for some threads, but those threads still get usable data. This will suck for performance for those threads, but not as much as lock contention would cause for all affected threads, and caches are a place where it is very easy to introduce lock contention into a web project that only hits once you've gone live.

*(and of course the well known variant, "there are only two hard problems in Computer Science: cache invalidation, naming things, and off-by-one errors").

Jon Hanna
Thanks Jon. Your answer is useful, thanks :)But in this post, my foremost concern is, how to design the page tree API for the plugin authors? make the tree mutable may bring us subtle bugs, make it immutable may make us hard to reset the properties (because there's no setters outside the PageCacheItem now).
Dylan Lin
Yes, that's true. There isn't a magic silver-bullet solution here, and you're going to have to balance the pros and cons and make a choice, but neither will be perfect. Possibly also, the best option won't be the best option down the line, as features are added and usage patterns change.
Jon Hanna