views:

241

answers:

4

Hi all

Background to question: I'm looking to implement a caching system for my website. Currently we're exploring memcache as a means of doing this. However, I am looking to see if something similar exists for SQL Server. I understand that MySQL has query cache which although is not distributed works as a sort of 'stop gap' measure. Is MySQL query cache equivalent to the buffer cache in SQL Server?

So here are my questions:

  1. Is there a way to know is currently stored in the buffer cache?
  2. Follow up to this, is there a way to force certain tables or result sets into the cache
  3. How much control do I have over what goes on in the buffer and procedure cache? I understand there used to be a DBCC PINTABLE command but that has since been discontinued.
  4. Slightly off topic: Should the caching even exists on the database layer? Or it is more prudent to manage caches using Velocity/Memcache? Is so, why? It seems like cache invalidation is something of a pain when handling many objects with overlapping triggers.

Thanks!

+3  A: 

Nai,

Answers to your questions follow:

  1. From Wiki - Always correct... ? :-). For a more Microsoft answer, here is their description on Buffer Cache.

    Buffer management

    SQL Server buffers pages in RAM to minimize disc I/O. Any 8 KB page can be buffered in-memory, and the set of all pages currently buffered is called the buffer cache. The amount of memory available to SQL Server decides how many pages will be cached in memory. The buffer cache is managed by the Buffer Manager. Either reading from or writing to any page copies it to the buffer cache. Subsequent reads or writes are redirected to the in-memory copy, rather than the on-disc version. The page is updated on the disc by the Buffer Manager only if the in-memory cache has not been referenced for some time. While writing pages back to disc, asynchronous I/O is used whereby the I/O operation is done in a background thread so that other operations do not have to wait for the I/O operation to complete. Each page is written along with its checksum when it is written. When reading the page back, its checksum is computed again and matched with the stored version to ensure the page has not been damaged or tampered with in the meantime.

  2. For this answer, please refer to the above answer:

    Either reading from or writing to any page copies it to the buffer cache. Subsequent reads or writes are redirected to the in-memory copy, rather than the on-disc version.

  3. You can query the bpool_commit_target and bpool_committed columns in the sys.dm_os_sys_info catalog view to return the number of pages reserved as the memory target and the number of pages currently committed in the buffer cache, respectively.

  4. I feel like Microsoft has had time to figure out caching for their product and should be trusted.

I hope this information was helpful,

Thanks!

Scott
+3  A: 

SQL Server implements a buffer pool same way every database product under the sun does (more or less) since System R showed the way. The gory details are explain in Transaction Processing: Concepts and Techniques. I addition it has a caching framework used by the procedure cache, permission token cache and many many other caching classes. This framework is best described in Clock Hands - what are they for.

But this is not the kind of caching applications are usually interested in. The internal database cache is perfect for scale-up scenarios where a more powerfull back end database is able to respond faster to more queries by using these caches, but the modern application stack tends to scale out the web servers and the real problem is caching the results of query interogations in a cache used by the web farm. Ideally, this cache should be shared and distributed. Memcached and Velocity are examples of such application caching infrastructure. Memcache has a long history by now, its uses and shortcommings are understood, there is significant know-how around how to use it, deploy it, manage it and monitor it.

The biggest problem with caching in the application layer, and specially with distributed caching, is cache invalidation. How to detect the changes that occur in the back end data and mark cached entries invalid so that new requests don't use stale data.

The simplest (for some definition of simple...) alternative is proactive invalidation from the application. The code knows when it changes an entity in the database, and after the change occurs it takes the extra step to mark the cached entries invalid. This has several short commings:

  • Is difficult to know exactly which cached entries are to be invalidated. Dependencies can be quite complex, things are always more that just a simple table/entry, there are aggregate queries, joins, partitioned data etc etc.
  • Code discipline is required to ensure all paths that modify data also invalidate the cache.
  • Changes to the data that occur outside the application scope are not detected. In practice, there are always changes that occur outside the application scope: other applications using the same data, import/export and ETL jobs, manual intervention etc etc.

A more complicated alternative is a cache that is notified by the database itself when changes occur. Not many technologies are around to support this though, it cannot work without an active support from the database. SQL Server has Query Notifications for such scenarios, you can read more about it at The Mysterious Notification. Implementing QN based caching in a standalone application is fairly complicated (and often done badly) but it works fine when implemented correctly. Doing so in a shared scaled out cache like Memcached is quite a feats of strength, but is doable.

Remus Rusanu
Great answer, thanks.
Nai
I don't suppose you have good links for this in your bookmarks? Like the short comings of memcache like you mentioned? Thanks
Nai
I'm not a memcached expert by any stretch. http://krow.livejournal.com/ is the place to look :)
Remus Rusanu
A: 

Caching can take many different meaning for an ASP.Net application spread from the browser all the way to your hardware with the IIS, Application, Database thrown in the middle.

The caching you are talking about is Database level caching, this is mostly transparent to your application. This level of caching will include buffer pools, statement caches etc. Make sure your DB server has plenty of RAM. In theory a DB server should be able to load the entire DB store in memory. There is not much you can do at this level unless you pre-fetch some anticipated data when you start the application and ensure that it is in DB cache.

On the other hand is in-memory distributed caching system. Apart from memcache and velocity, you can look at some commercial solutions like NCache or Oracle Coherence. I have no experience in either of them to recommend. This level of caching promises scalability at a cheaper cost. It is expensive to scale the DB tier compared to this. You may have to consider aspects like network bandwidth though. This type of caching, specially with invalidation and expiry can be complicated

You can cache at Web Service tier using output caching at IIS level (in IIS 7) and ASP.Net level.
At the application level you can use ASP.Net cache. This is the one that you can control most and gives you good benefits.

Then there is caching going on at client web proxy tier that can be controlled by cache-control HTTP header.

Finally you have browser level caching, view state and cookies for small data.

And don't forget that hardware like SAN caches at physical disk access level too.

In summary caching can occur at many levels and it for you to analyse and implement the best solution for your scenario. You have find out stability and volatility of your data, expected load etc. I believe caching at ASP.Net level (specially for objects) gives you most flexibility and control.

Pratik
A: 

Your specific technical questions about SQL Server's buffer cache are going down the wrong path when it comes to "implement a caching system for my website".

Sure, SQL Server is going to cache data so it can improve its performance (and it does so rather well), but the point of implementing a caching layer on your web front-ends is to avoid from having to talk to the database at all - because there is still overhead and resource contention even when your query is fulfilled entirely from SQL Server's cache.

You want to be looking into is: memcached, Velocity, ASP.NET Cache, P&P Caching Application Block, etc.

Bryan Batchelder