views:

2365

answers:

4

Hi

We use memcache basically as an after thought to just cache query results.

Invalidation is a nightmare due to the way it was implemented. We since learned some techniques with memcache thru reading the mailing list, for example the trick to allow group invalidation of a bunch of keys. For those who know it, skip the next paragraph..

For those who don't know and are interested, the trick is adding a sequence number to your keys and storing that sequence number in memcache. Then every time before you do your "get" you grab the current sequence number and build your keys around that. Then, to invalidate the whole group you just increment that sequence number.

So anyway, I'm currently revising our model to implement this.

My question is..

We didn't know about this pattern, and I'm sure there are others we don't know about. I've searched and haven't been able to find any design patterns on the web for implementing memcache, best practices, etc.

Can someone point me to something like this or even just write up an example? I would like to make sure we don't make a beginners mistake in our new refactoring.

+5  A: 

One point to remember with object caching is that it's just that - a cache of objects/complex structures. A lot of people make the mistake of hitting their caches for straightforward, efficient queries, which incurs the overhead of a cache check/miss, when the database would have obtained the result far faster.

This piece of advice is one I've taken to heart since it was taught to me; know when not to cache, that is, when the overhead cancels out the perceived benefits. I know it doesn't answer the specific question here, but I thought it was worth pointing out as a general hint.

Rob
Thanks Rob very good point.
+3  A: 

What rob is saying is good advice. From my experience, there are two common ways to identify and invalidate tags: unique identification and tag-based identification. Those are usually combined to form a complete solution in which:

  1. A cache record is assigned a unique identifier (which usually depends somehow on the data that it caches) and optionally any number of tags.
  2. Cache records are recalled by their unique identifier.
  3. Cache records can be invalidated by their unique identifier (one at a time), or by any tag they are tagged with (possibly invalidating multiple records at the same time).

This is relatively simple to implement and generally works very well. I have yet to come across a system that needed more, though there are probably some edge cases out there that require specific solutions.

Eran Galperin
How do you implement the tagging? Are you using this hacked memcached http://code.google.com/p/memcached-tag/ or are you doing it in your app? Can you post an example?
I am using something similar to how Zend_Cache is implemented. You can read about it here - http://framework.zend.com/manual/en/zend.cache.theory.html#zend.cache.tags and check out the source for the actual implementation
Eran Galperin
+2  A: 

I use the Zend Cache component (you don't have to use the entire framework just the zend cache stuff if you want). It abstracts some of the caching stuff (it supports grouping cache by 'tags' though that feature is not supported for the memcache back end I've rolled my own support for 'tags' with relative ease). So the pattern i use for functions that access cache (generally in my model) is:

public function getBySlug($ignoreCache = true)
{
 if($ignoreCache || !$result = $this->cache->load('someKeyBasedOnQuery'))
 {
  $select = $this->select()
    ->where('slug = ?', $slug);
  $result = $this->fetchRow($select);

  try
  {
   $this->cache->save($result,'someKeyBasedOnQuery');
  }
  catch(Zend_Exception $error)
  {
    //log exception
  }
 }
 else
 {
  $this->registry->logger->info('someKeyBasedOnQuery came from cache');
 }
 return $result;

}

basing the cache key on a hash of the query means that if another developer bypasses my models or used another function elsewhere that does the same thing it's still pulled from cache. Generally I tag the cache with a couple generate tag (the name of the table is one and the other is the name of the function). So by default our code invalidates on insert,delete and update the cached items with the tag of the table. All in all caching is pretty automatic in our base code and developers can be secure that caching 'just works' in projects that we do. (also the great side effect of making use of tagging is that we have a page that offers granular cache clearing/management, with options to clear cache by model functions, or tables).

Akeem
+1  A: 

We also store the query results from our database (PostgreSQL) in memcache and we are using triggers on the tables to invalidate the cache - there are several APIs out there (e.g. pgmemcache, i think mysql has somethink like that too but i don't know for sure). The benefit is that the database self (triggers) can handle the invalidation of data on changes (update,insert,delete), you don't need to write all that stuff into your "application".

Endlessdeath