views:

89

answers:

4

I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.

The advantage of this model is that it:

  1. Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
  2. If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
  3. Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.

The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.

One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.

This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.

+1  A: 

Have you considered memcached? Since it is:

  1. in memory
  2. can run locally
  3. fully scalable horizontally
  4. prevents the need to re-cache on each web server

It may fit the bill. Check out Google for lots of details and usage stories.

gahooa
My concern with memcached is the issue of initially populating it. All objects not yet in the cache have to go to the database and if the database is down and the memcached cluster fails at the same time. The startup penalty could be massive.
Matt Wrock
A: 

Have you considered using SqlDependency caching?

You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.

In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.

If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.

In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).

RickNZ
A: 

Just some addition to what RickNZ proposed above.. Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:

Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy. On all the webservers, install SQL Express and subscribe to this Publication. Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.

All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.

Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.

Girish
+1  A: 

I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.

First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.

Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.

Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.

My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:

  • Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
  • Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
  • Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.

If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.

Brent Ozar