views:

266

answers:

2

I’m looking for some strategies regarding accessing some cached data that resides in a internal company web service. Actually, preventing access of the cached data while the cache is being refreshed.

We have a .Net 3.5 C# web service running on a web farm that maintains a cache of a half-dozen or so datasets. This data is configuration associated items that are referenced by the ‘real’ business logic domain that is also running in this web service as well as being returned for any client uses. Probably talking a total of dozen or so tables with a few thousand records in them.

We implemented a caching mechanism using the MS Enterprise Library 4.1. No huge reason for using this over the ASP.Net cache except that we were already using Enterprise Library for some other things and we liked the cache expiration handling. This is the first time that we have implemented some caching here so maybe I’m missing something fundamental…

This configuration data doesn’t get changed too often – probably a couple of times a day. When this configuration data does change we update the cache on the particular server the update request went to with the new data (the update process goes through the web service). For those other servers in the web farm (currently a total of 3 servers), we have the cache expiration set to 15 minutes upon which the data is re-loaded from the single database that all servers in the farm hit. For our particular purposes, this delay between servers is acceptable (although I guess not ideal).

During this refresh process, other requests could come in that require access to the data. Since the request could come during an expiration/refresh process, there is no data currently in the cache, which obviously causes issues.

What are some strategies to resolve this? If this was going in a single domain sort of WinForm type of application we could hack something up that would prevent access during the refresh by the use of class variables/loops, threading/mutex, or some other singleton-like structure. But I’m leery on implementing something like that running on a web farm. Should I be? Is a distributed server caching mechanism the way to go instead of each server having its own cache? I would like to avoid doing that for now if I could and come up with some coding to get around this problem. Am I missing something?

Thanks for any input.

UPDATE: I was going to use the Lock keyword functionality around the expiration action that subsequently refreshes the data, but I was worried about doing this on a web server. I think that would have worked although it seems to me that there still would be a possibility (although a lesser one) that we could have grabbed data from the empty cache between the time it expired and the time the lock was entered (the expiration action occurs on another thread I think). So what we did was if there was no data in the cache during a regular request for data we assume that it is in the process of being refreshed and just grab the data from the source instead. I think this will work since we can assume that the cache should be filled at all times since the initial cache filling process will occur when the singleton class that holds the cache is created when a web service request is first made. So if the cache is empty it truly means that it is currently being filled, which normally only takes a few seconds so any requests for data from the cache during that time will be the only ones that aren't hitting the cache.

If anyone with experience would like to shed any more light on this, it would be appreciated.

A: 

It sounds to me like you are already serving out stale data. So, if that is allowed, why don't you populate a new copy of the cache when you discover its old and only switch to using it once its completely populated.

Sam Saffron
Yes, serving out stale data is fine (for up to 15 minutes). We don't really have a way to check if the dale is stale though from one of the other servers in the farm (if that server was not the one to get the updated data) - without going back to the database to check a timestamp or something, which we are trying to avoid.
A: 

It really depends on the updating logic. Where is that you decide to update the cache? Can you propagate the update to all the servers in the farm? Then you should lock while updating. If your update process is initiated by a user action, can you let the other servers know that they should expire their cache?

The cache on one of the servers in the farm is updated when an update to the data is sent (each of our clients get a copy of the data). This updating is a semi-rare occurence done by admin level people (although when updating they tend to make a lot of changes). We currently have no way to propagate to or notify the other servers in the farm since we aren't using a distruted caching mechanism and didn't want to write our own in some manner like using some sort of central database flag. I thought it might be a bad idea to actually lock the cache while refreshing in a web server environment.
We actually 'solved' this by retrieving the data directly from the source database if the cache is empty - we are assuming that if the cache is empty that we are in the middle of refreshing it. While this could be expensive in terms of performance we thought that this was preferable to some sort of lock. This could only happen during a refresh which occurs every 15 minutes - and then only during the few seconds it takes to refill the cache. This is an acceptable hit based on the nature of the data and the chance of this occuring.