views:

97

answers:

1

Hi,

I am currently thinking about Caching Strategies and more importantly avoiding any duplication of data inside the cache. My query is kind of language agnostic but very much programming related.

My question is regarding the efficient caching of paged or filtered data but more than that, distributed caching. The latter I have decided to go with memcached and more specifically a .NET port of it. I have seen another commercial option in the form of NCache, but memcached seems perfectly acceptable to me and apparently is used on facebook, myspace etc...

My query then is a strategy my which you can contain objects in cache and also a reference to them with paged data. If I have 100 items and I page them, then I could cache the ids of product 1-10 inside the cache and cache each product seperately. If I where to sort the items descending then items 1-10 would be different products so I would not want to store the actual objects each time the paged data/sorting/filtering changed, but instead stored the ids of the objects so I could then perform a trabsactional lookup in the databse if some of them do not already exist in the cache or are invalid.

My initial idea was this for a cache key.

paged_<pageNumber><pageSize><sort><sortDirection>[<filter>]

I would then iterate through the cache keys and remove any which start with "paged_" My question ultimately is if any one knows of any patterns or ideas about straties regarding caching of such patterns of data such as paged data and also making sure that objects are not cached more than once.

memcached is native code and would not have a problem clearing the cache in the way I have stated above, but it is an obvious fact that the more items in the cache the more time it would take. I am interested if anyone knows of any solution or theory to this type of problem which is currently beig employed. I am sure there will be . Thank you for your time

TIA

Andrew

A: 

I once tried, what I think, is a similar caching strategy and found it unwieldy. I eventually ended up just caching the objects that make up the pages and generating the pages for every request. 10 cache hits to construct a page is going to be (hopefully) sub second response time, pretty much instant to the users of your service.

If you must cache entire pages (I think of them as result sets) then perhaps you could run the user request through a hash and use that as your cache key. It's a hard problem to visualize with a concrete example or code (for me at least).

Gandalf