tags:

views:

279

answers:

2

I'm new to memcached. Is this code vulnerable to the expired cache race condition? How would you improve it?

$memcache = new Memcache;
$memcache->connect('127.0.0.1');
$arts = ($memcache===FALSE) ? FALSE : $memcache->get($qparams);
if($arts===FALSE) {
 $arts=fetchdb($q, $qparams);
 $memcache->add($qparams, $arts, MEMCACHE_COMPRESSED, 60*60*24*3);
}
if($arts<>FALSE) {
 // do stuff
} else {
 // empty dataset
}
  • $qparams contains the parameters to the query, so I'm using it as key.
  • $arts get's an array with all fields I need for every item.

Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).

  • What should I do about the cache?
  • How can I know in row #50 is cached?
  • Should I invalidate ALL the entries in the cache? (sounds like overkill to me).
A: 

You have to invalidate any cached object that contains a modified item. Either you have to modify the cache mechanism to store items at a more granular level, or invalidate the entire entry.

It's basically the same as saying you're caching the entire DB in a single cache-entry. You either expire it or you don't.

Cylindric
+2  A: 

Is this code vulnerable to the expired cache race condition? How would you improve it?

Yes. If two (or more) simultaneous clients try to fetch the same key from the cache and end up pulling it from the database. You will have spikes on the database and for periods of time the database will be under heavy load. This is called cache stampede. There are a couple of ways to handle this:

  • For new items preheat the cache (basically means that you preload the objects you require before the site goes live).
  • For items that expire periodically create an expire time that is a bit in the future than the actual expire time (lets say 5-10 minutes). Then when you pull the object from the cache, check if the expire time is close, cache into the future to prevent any other client from updating the cache, and update from the database. For this to work with no cache stampedes you would need to either implement key locking or use cas tokens (would require the latest client library to work).

For more info check the memcached faq.

Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).

You have three types of data in cache:

  1. Objects
  2. Lists of Objects
  3. Generated data

What I usually do is to keep the objects as separate keys and then use cache "pointers" in lists. In your case you have N objets somewhere in cache (lets say the keys are 1,2..N), and then you have your list of objects in an array array(1,2,3,10,42...). When you decide to load the list with objects, you load the list key from cache, then load the actual objects from cache (using getMulti to reduce requests). In this case if any of the object gets updated, you update it in one spot only and it is automatically updated everywhere (not to mention that you save huge amount of space with this technique).

Edit: Decided to add a bit more info regarding the lookahead time expiration.

You set up your object with an expiration data x and save it into the database with an expiration date of x+5minutes. This are the steps you take when you load the object from the cache:

  1. Check if it is time to update (time() - x < 0)
  2. If so, lock the key so nobody can update it while you are refreshing the item. If the you cannot lock the key, then somebody else is already updating the key, and it becomes a SEP (Somebody Else's Problem). Since memcached has no solution for locks, you have to devise your own mechanism. I usually do this by adding a separate key with the original keys value + ":lock" at the end. You must set this key to expire in the shortest amount possible (for memcached that is 1 second).
  3. If you obtained a lock on the key, you first save the object with a new expiration time (this way you are sure no other clients will try to lock the key), then go about your business and update the key from the database and save the new value again with the appropriate lookahead expirations (see point 1).

Hope this clears everything up :)

Miha Hribar
For the expire time in te future. I don't get it. Let's say that my "close expire" time is 5mins, and I expect item#5432 to expire at 17:00hs. At 16:55hs four different users request item#5432. They all will hit the db...I'll look into locking and cas tokens, but I don't understand the benefit of doing this x mins before.
The Disintegrator
If you do that 5 minutes before the object expires the rest of the clients can still use the stale object while you load the fresh one from the database. Grant it you still have to use locking and cas tokens to prevent everybody from updating at once. You could do a probabilistic guess and update with higher probability the closer to the expiration time you get. Either way you want to have only one client accessing the database. You could even set up a cron that would to that for you in the background, but that would demand a very specific looking object :)
Miha Hribar
I have edited the answer with more details. Hope it's clearer now :)
Miha Hribar
Ok, I'll try this.Know any good resource about Memcached::cas? the php documentation is bare bones at best
The Disintegrator
The best resource about memcached is the protocol specification itself http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
Miha Hribar