views:

503

answers:

2

I am deleting a single path from the Django cache like this:

from models                   import Graph
from django.http              import HttpRequest
from django.utils.cache       import get_cache_key
from django.db.models.signals import post_save
from django.core.cache        import cache

def expire_page(path):
    request      = HttpRequest()
    request.path = path
    key          = get_cache_key(request)
    if cache.has_key(key):   
        cache.delete(key)

def invalidate_cache(sender, instance, **kwargs):
    expire_page(instance.get_absolute_url())

post_save.connect(invalidate_cache, sender = Graph)

This works - but is there a way to delete recursively? My paths look like this:

/graph/123
/graph/123/2009-08-01/2009-10-21

Whenever the graph with id "123" is saved, the cache for both paths needs to be invalidated. Can this be done?

A: 

Checkout shutils.rmtree() or os.removedirs(). I think the first is probably what you want.

Update based on several comments: Actually, the Django caching mechanism is more general and finer-grained than just using the path for the key (although you can use it at that level). We have some pages that have 7 or 8 separately cached subcomponents that expire based on a range of criteria. Our component cache names reflect the key objects (or object classes) and are used to identify what needs to be invalidated on certain updates.

All of our pages have an overall cache-key based on member/non-member status, but that is only about 95% of the page. The other 5% can change on a per-member basis and so is not cached at all.

How you iterate through your cache to find invalid items is a function of how it's actually stored. If it's files you can use simply globs and/or recursive directory deletes, if it's some other mechanism then you'll have to use something else.

What my answer, and some of the comments by others, are trying to say is that how you accomplish cache invalidation is intimately tied to how you are using/storing the cache.

Second Update: @andybak: So I guess your comment means that all of my commercial Django sites are going to explode in flames? Thanks for the heads up on that. I notice you did not attempt an answer to the problem.

Knipknap's problem is that he has a group of cache items that appear to be related and in a hierarchy because of their names, but the key-generation logic of the cache mechanism obliterates that name by creating an MD5 hash of the path + vary_on. Since there is no trace of the original path/params you will have to exhaustively guess all possible path/params combinations, hoping you can find the right group. I have other hobbies that are more interesting.

If you wish to be able to find groups of cached items based on some combination of path and/or parameter values you must either use cache keys that can be pattern matched directly or some system that retains this information for use at search time.

Because we had needs not-unrelated to the OP's problem, we took control of template fragment caching -- and specifically key generation -- over 2 years ago. It allows us to use regexps in a number of ways to efficiently invalidate groups of related cached items. We also added a default timeout and vary_on variable names (resolved at run time) configurable in settings.py, changed the ordering of name & timeout because it made no sense to always have to override the default timeout in order to name the fragment, made the fragment_name resolvable (ie. it can be a variable) to work better with a multi-level template inheritance scheme, and a few other things.

The only reason for my initial answer, which was indeed wrong for current Django, was because I have been using saner cache keys for so long I literally forgot the simple mechanism we walked away from.

@knipknap: if you would like to discuss this issue further and/or would like some code, contact me at [email protected].

Peter Rowell
Not *quite* what I meant. Thanks though.
knipknap
Well, then you need to be more specific. Where is your cache? Files? memcached? Database? If they are in files, then my answer is in the right direction. If they are somewhere else then you need some mechanism to match all cached items that contain your modified object. We solved this problem (and several other cache invalidation problems) by implementing our own caching mechanism. It's not that hard to do.
Peter Rowell
Sorry to downvote but the original question was perfectly clear to anyone that uses Django!
andybak
+4  A: 

You might want to consider employing a generational caching strategy, it seems like it would fit what you are trying to accomplish. In the code that you have provided, you would store a "generation" number for each absolute url. So for example you would initialize the "/graph/123" to have a generation of one, then its cache key would become something like "/GENERATION/1/graph/123". When you want to expire the cache for that absolute url you increment its generation value (to two in this case). That way, the next time someone goes to look up "/graph/123" the cache key becomes "/GENERATION/2/graph/123". This also solves the issue of expiring all the sub pages since they should be referring to the same cache key as "/graph/123".

Its a bit tricky to understand at first but it is a really elegant caching strategy which if done correctly means you never have to actually delete anything from cache. For more information here is a presentation on generational caching, its for Rails but the concept is the same, regardless of language.

jkupferman