tags:

views:

295

answers:

3

Hi,

We are trying to update memcached objects when we write to the database to avoid having to read them from database after inserts/updates.

For our forum post object we have a ViewCount field containing the number of times a post is viewed.

We are afraid that we are introducing a race condition by updating the memcached object, as the same post could be viewed at the same time on another server in the farm.

Any idea how to deal with these kind of issues - it would seem that some sort of locking is needed but how to do it reliably across servers in a farm?

+1  A: 

If you're dealing with data that doesn't necessarily need to be updated realtime, and to me the view count is one of them, then you could add an expires field to the objects that are stored in memcache.

Once that expiration happens, it'll go back to the database and read the new value, but until then it will leave it alone.

Of course for new posts you may want this updated more often, but you can code for this.

Memcache only stores one copy of your object in one of its instances, not in many of them, so I wouldn't worry about object locking or anything. That is for the database to handle, not your cache.

Edit:

Memcache offers no guarantee that when you're getting and setting from varied servers that your data won't get clobbered.

From memcache docs:

  • A series of commands is not atomic. If you issue a 'get' against an item, operate on the data, then wish to 'set' it back into memcached, you are not guaranteed to be the only process working on that value. In parallel, you could end up overwriting a value set by something else.

Race conditions and stale data

One thing to keep in mind as you design your application to cache data, is how to deal with race conditions and occasional stale data.

Say you cache the latest five comments for display on a sidebar in your application. You decide that the data only needs to be refreshed once per minute. However, you neglect to remember that this sidebar display is renderred 50 times per second! Thus, once 60 seconds rolls around and the cache expires, suddenly 10+ processes are running the same SQL query to repopulate that cache. Every time the cache expires, a sudden burst of SQL traffic will result.

Worse yet, you have multiple processes updating the same data, and the wrong one ends up dating the cache. Then you have stale, outdated data floating about.

One should be mindful about possible issues in populating or repopulating our cache. Remember that the process of checking memcached, fetching SQL, and storing into memcached, is not atomic at all!

Nathan
The problem is that we would like the view count (in this case, but there are other situations with the same problem) to be updated live - you click the post, and the viewcount increases. Also we would like the cache object to live as long as possible, for performance reasons of course.
Micael
You can't guarantee what you're attempting to do in the way you want to do it. What memcache gives you is scalability, not raw performance.
Nathan
+1  A: 

memcached operations are atomic. the server process will queue the requests and serve each one completely before going to the next, so there's no need for locking.

edit: memcached has an increment command, which is atomic. You just have to store the counter as a separate value in the cache.

Javier
True but i this case we would be GETting the item, incrementing the ViewCount and PUTting it again, and as Nathan also states, this is not an atomic operation
Micael
+1  A: 

I'm thinking - could a solution be to store viewcount seperately from the Post object, and then do an INCR on it. Of course this would require reading 2 seperate values from memcached when displaying the information.

Micael