views:

396

answers:

1

I have a large DataTable cached in my web app that is the result of a complex query that returns a large data set. Whilst this data table is cached the query that runs to "refresh" this cache still takes a long time, largely due to the sheer amount of data being returned.

In order to speed this up I am considering implementing a timestamp type approach to my tables in order to limit my query to only return rows which have changed.

I then intend to merge this smaller dataset with my cached datatable.

Has anyone done anything similar to this, or is there anything out there that handles this already?

I feel this could be a re-inventing the wheel situation if I dive straight in.

+1  A: 

Personally, I've used the timestamp approach before and that does work well - it does make the caching more efficient by only retrieving the data that has changed since the last read.

Alternatively, I'd suggest the SqlCacheDependency class which takes care of keeping the cache up to date for you. I can't comment on any real-world pros + cons of this, or of performance comparison vs. timestamp approach as I haven't used it myself.

There's another useful article on SqlCacheDependency here

Update: Yes, I don't think it will actually refresh the data. It sounds like you'd have to do that yourself. From the 2nd link:

When the data changes—and only then—the cache items based on that data are invalidated and removed from the cache. The next time you request that item from the cache, if it is not in the cache, you can re-add the updated version to the cache and be assured that you have the latest data

There's also SQL 2005 specific implementation notes in the 2nd link:

SQL Server 2005 monitors changes to the result set of a particular SQL command. If a change occurs in the database that would modify the results set of that command, the dependency causes the cached item to be invalidated. This allows SQL Server 2005 to provide row-level notification.

I personally think I'd go for the timestamp approach (that's what I've done before) as I can't see on the face of it that SqlCacheDependency would give any performance benefits - I think it would be less performant (just easier to implement). One day, I'll get round to actually trying out SqlCacheDependency to do a proper performance analysis :)

Update 2: Regarding the merging of new data into the existing datatable, I think the Merge method of the datatable is what you want.

The Merge method is used to merge two DataTable objects that have largely similar schemas. A merge is typically used on a client application to incorporate the latest changes from a data source into an existing DataTable.
...
...
When merging a new source DataTable into the target, any source rows with a DataRowState value of Unchanged, Modified, or Deleted, is matched to target rows with the same primary key values. Source rows with a DataRowState value of Added are matched to new target rows with the same primary key values as the new source rows.

You just need to ensure you define the column(s) on the datatable that are the primary key.

AdaTheDev
I thought the SqlCacheDependency was just a way of indicating that your data is out of date. The key thing I'm looking at doing here is only returning the out of date records rather than retrieving the whole datatable. Do you know if it is capable of doing this?
Robin Day
Yes I believe you're right (updated notes in my answer). It would be nice if it could refresh just the changed data but I don't think it does this - if it could, I'm not sure how new data would be dealt with anyway without re-executing the full, original query. Hence, I think timestamp approach is the best.
AdaTheDev
Thanks for the update. Is as I thought. The key thing I'm after is more any ideas around how to update the cached datatable with the changed records. I thought there might be a built in "merge" function.
Robin Day
Do you mean ideas for how to merge a 2nd datatable containing the updated records, in to the original cached datatable after you've queried the updated records?Or do you mean ideas whereby you wouldn't have to query by timestamp yourself?
AdaTheDev
The ways to merge the cached datatable with the newly queried changed rows. I have no problem with the using timestamps, the problem is more how to get the changes into my cached datatable with the minimum of hassle.
Robin Day
I thought that was what you meant, just wanted to make sure in case I went off on a tangent :) See Update 2.
AdaTheDev
It took me a while... but many thanks, I did get the Merge method to work eventually!!!
Robin Day
Nice one. +1 for the question by the way!
AdaTheDev