tags:

views:

321

answers:

7

Just wondering how you guys manage your cache invalidations. Given that there might objects (hundreds and thousands) in the cache that might be triggered by different algorithms or rules. How do you keep track of it all?

Is there anyway you can reference the relationships from a table in the database and enforce it somehow?

Do bear with me as I've never done any caching before.

+1  A: 

See this article and related Stack Overflow question.

In general cache invalidation can be rather tricky especially when cached objects are updated.

Juha Syrjälä
thanks for the link, it was very useful!
Nai
A: 

For general solutions you can look at the link provided by Juha.

But following your question, I'd like to describe, how it is done in our project.
We do not use any general solution for caches. Our cache grew eventually. Initially we had no intention to use the cache at all. But later the cache was born. As the cache was added to the system lately, it is not aware of any "database" or other "smart things". Instead, we carefully check if somebody changes the cache. So, I'd call our cache "algorithm-driven".

(The only really necessary general thing is the functionality to treat the miss in the cache.
And another thing worth attention is identification with clients: if you have multiple clients, one cache can be not enough... But for both problems only specific solutions were added, not general!)

I know, to describe such a basic functionality may sound silly. One may say that "we had to use the normal cache in the first place". But you know, in reality sometimes some things are just out of your control and you just have to do the best you can.

So to summarize: we need no general solution. Our algorithms control the cache. This keeps the cache small (both in code and in memory during run time). That is our approach.

avp
great, hope to get more answers like this
Nai
+3  A: 

The purpose of your cache layer should be pretty much that : reflecting the corresponding data in your database, but providing it faster than the database would, or at least providing it without keeping the database busy.

To achieve this, you have two solutions :

  1. know the exact lifespan of everything you store in cache
  2. keep your cache up-to-date with your database

The first is pretty rare, but pretty easy to deal with : just update your cache on a regular basis.

The second point is what you'll most likely deal with in your projects : just update your cache when your database is updated. It's simpler than you'd think :

  • Add a new object to your cache right after you successfully added it to your database.
  • Update an object in your cache right after you successfully updated it in your database.
  • Delete an object from your cache right after you successfully deleted it in your database.

If your code is clean enough, it should be easy to implement an efficient cache policy on top of it. There's a little more about caching and how to do it well in that answer I posted some times ago. Hopefully this all will help you :)

Nicolas
But what if the code is not "clean enough"? (Just in case that it is not your code, but legacy one).And also, what if you have no database in the system, but slow devices you communicate with instead?
avp
"clean enough" code make it easier, but my point is the same with crappy legacy code. What's important here is the logic you have to stick to : add the code that will ensure that your cached data is always up-to-date with the data in your DB. Messy code will only make you spend more time ensuring that every DB operation is followed by the correct cache add/update/delete operation. (Followed up in the next comment...)
Nicolas
And working with slow devices instead of a database, or whatever you work with, is not a "problem" neither. What's important is generally working with up-to-date data, not hitting cache every time. If you favor low-latency over data freshness, you have other problems to deal with before optimisation IMO.
Nicolas
A: 

If you are using SQL Server 2005 or later and .NET, you may want to look into using the SQLDependency class. What this class does is use the SQL Server Service Broker to notify you when certain modifications have taken place on your data. You can use this as a trigger to invalidate your cache. Again, this only applies if you're using those technologies.

Jacob
To whoever down-voted this answer, it would benefit the community if you explained why.
Jacob
+2  A: 

As you seemed to of worked out, it's not as simple as, for example, updating the cache of a news story when a news story updates. There are other relationships, for example, lists of latest news stories that you need to update.

The simplest way to do this is to relate all objects that are related. I've previously used the concept of cache groups. Continuing my news example, in the cache group 'news' would be; the news story, the various lists of news stories and anything else that contains news stories.

When I edit a news story, the system recognises that it needs to update the cache group 'news' and goes through the following process...

  1. get each object before the save of updates
  2. save
  3. get the object again, if it's different update, the various caches

That's a very simple example, of course. A much neater way of going about it is to write your code to always maintain the object as it would be in the cache.

If you add a tag to the news article you code could just write those changes to the database but if you instead update the news article object and the relevant tag object both those two objects can 'know' they have changed (as simple as setting hasChanged = true) and you can then update the cache and save out to the database automatically.

Salgo
+1 for cache groups. I've been all over the Internet for the past several hours looking for ideas for how to handle cache invalidation, and this is the first place I've seen this mentioned.
Aaron
A: 

Just wondering how you guys manage your cache invalidations. Given that there might objects (hundreds and thousands) in the cache that might be triggered by different algorithms or rules. How do you keep track of it all?

I'm not sure to understand clearly this part but I think that you should define different "regions" (like in Hibernate terminology), each with its own content and rules.

Is there anyway you can reference the relationships from a table in the database and enforce it somehow?

The persistence layer is the best place to do this in my opinion as it is aware of what is happening with persistent and potentially cached entities. Hibernate, for example, supports (second level) caching and allows to define the name of the second level cache region, the caching strategy (read-only, read-write, nonstrict-read-write, transactional) per entity. Hibernate actually define an interface and allows to plug a cache implementation, depending on your needs (the cache type, the supported strategies, the cluster support).

Do bear with me as I've never done any caching before.

Depending on the complexity of your needs, this might not be a simple task. Maybe you should use or look at existing solutions. In the Java world, EHCache, OSCache, SwarmCache, JBoss Cache 2 are invalidating caches (or support it). This is just a suggestion as you didn't mention any language.

Pascal Thivent
A: 

checking for spam

arpan