views:

444

answers:

3

I'm currently in the process of building a repository for a project that will be DB intensive (Performance tests have been carried out and caching is needed hence why I'm asking )

The way I've got it set up now is that each object is individually cached, if I want to do a query for them objects I pass the query to the database and return a the id's required. (For some simple queries I've cached and manage the ids)

I then hit the cache with these ids and pull them out, any missing objects are bundle in to "where in" statement and fired to the database; at this point I repopulate the cache with the missing ids.

The queries them selves are most likely to be about paging / ordering the data.

Is this a suitable strategy? Or perhaps are there better techniques available?

Thanks Tony

A: 

I don't advice custom caching strategy. Caching is hard. According to your platform of choice you might want to chose a third-party caching library/tool.

Manrico Corazzi
+3  A: 

This is a reasonable approach and I have gone this route before and it's best to use this for simple caching.

However, when you are updating or writing to the database you will run into some interesting problems and you should handle these scenarios carefully.

For example your cache data will become obsolete if the user updates the record in the database. In that scenario you will either need to simultaneously update the in-memory cache or purge the cache so that it can be refreshed on the next fetch query.

Things can also get tricky if you for example the user updates a customer's email address which is in a separate table but associated via a foreign key.

Besides database caching you should also be considering output caching. This works quite well if for example you have a table that shows sales data for previous month. The table could be stored in another file that gets included in a bunch of other pages that want to show the table. Now if you cache the file with the sales data table, those other pages when they request this file, the caching engine can fetch it straight from the disk and the business logic layer doesn't even get hit. This is not applicable all the time but quite useful for custom controls.

Unit of Work Pattern

It also helps to know about the Unit of Work pattern.

When you're pulling data in and out of a database, it's important to keep track of what you've changed; otherwise, that data won't be written back into the database. Similarly you have to insert new objects you create and remove any objects you delete.

You can change the database with each change to your object model, but this can lead to lots of very small database calls, which ends up being very slow. Furthermore it requires you to have a transaction open for the whole interaction, which is impractical if you have a business transaction that spans multiple requests. The situation is even worse if you need to keep track of the objects you've read so you can avoid inconsistent reads.

A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.

aleemb
When any data is updated the cache is updated - and it updates the table fine too. I've used the repository pattern in this project and all data access is through this exclusively.The repository it self always hits the cache then db (if cache is empty or data is being updated)
TWith2Sugars
If you haven't already and if the option is available to you, why not consider an ORM?
aleemb
+1  A: 

If you are using SQLServer, you can use SqlCacheDependency where your cache will be automatically repopulated when the data table changes in the database. Here's the link for SqlCacheDependency

This link contains a similar cache dependency solution. (It's for a file rather than a DB. You will need to make some changes as per the msdn link above to have a cache dependency on DB)

Hope this helps :)

Rashmi Pandit