Is this code vulnerable to the expired cache race condition? How would you improve it?
Yes. If two (or more) simultaneous clients try to fetch the same key from the cache and end up pulling it from the database. You will have spikes on the database and for periods of time the database will be under heavy load. This is called cache stampede. There are a couple of ways to handle this:
- For new items preheat the cache (basically means that you preload the objects you require before the site goes live).
- For items that expire periodically create an expire time that is a bit in the future than the actual expire time (lets say 5-10 minutes). Then when you pull the object from the cache, check if the expire time is close, cache into the future to prevent any other client from updating the cache, and update from the database. For this to work with no cache stampedes you would need to either implement key locking or use cas tokens (would require the latest client library to work).
For more info check the memcached faq.
Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).
You have three types of data in cache:
- Objects
- Lists of Objects
- Generated data
What I usually do is to keep the objects as separate keys and then use cache "pointers" in lists. In your case you have N objets somewhere in cache (lets say the keys are 1,2..N
), and then you have your list of objects in an array array(1,2,3,10,42...)
. When you decide to load the list with objects, you load the list key from cache, then load the actual objects from cache (using getMulti
to reduce requests). In this case if any of the object gets updated, you update it in one spot only and it is automatically updated everywhere (not to mention that you save huge amount of space with this technique).
Edit: Decided to add a bit more info regarding the lookahead time expiration.
You set up your object with an expiration data x
and save it into the database with an expiration date of x+5minutes
. This are the steps you take when you load the object from the cache:
- Check if it is time to update (
time() - x < 0
)
- If so, lock the key so nobody can update it while you are refreshing the item. If the you cannot lock the key, then somebody else is already updating the key, and it becomes a SEP (Somebody Else's Problem). Since memcached has no solution for locks, you have to devise your own mechanism. I usually do this by adding a separate key with the original keys value
+ ":lock"
at the end. You must set this key to expire in the shortest amount possible (for memcached that is 1 second).
- If you obtained a lock on the key, you first save the object with a new expiration time (this way you are sure no other clients will try to lock the key), then go about your business and update the key from the database and save the new value again with the appropriate lookahead expirations (see point 1).
Hope this clears everything up :)