views:

164

answers:

1

I'm working on a web product that is composed of many modular web applications. To the end user, it appears to be one single application, though various components are broken out into their own apps.

Part of the reasoning for this is so that it can be easily scaled horizontally across multiple application servers.

To facilitate easier horizontal scaling of the data tier, we are planning on using a web service layer in front of the database. This layer could be scaled out to N machines, and each instance of it would handle caching individually.

The idea is that an application would make a call to the service tier load balancer, which would assign the call to a service instance, this would then use its cache to return data, or connect up to the database and query the data. It seems like this would be the easiest forward looking solution to scaling out without having to heavily modify application code.

[N Amount of Databases]
         |
         \/
[Service Tier X N amount of Machines]
         |
         \/
[Application Tier X n amount of Machines]

Some questions come up though, I'd like to persist user session at the service level, so that each application would just authenticate with a token, however I'm uncertain on how I'd maintain session data across all service machines without having a single point of failure.

Any ideas on how to pull this off? Any other ideas on architecture? Has anyone else had the project of designing a site that could potentially handle millions of hits per day?

EDIT: Not even an idea? :(

+2  A: 

Hi Jonathan, you've described the perfect use case for a distributed caching mechanism such as memcached (http://www.danga.com/memcached) or the upcoming MS Velocity project (http://code.msdn.microsoft.com/velocity).

In the situation you describe where you have an increasing number of Service Tier instances each doing their own local caching, the usefulness of your cache decreases with each new box because each individual instance must retrieve the same data from the database to populate its local cache even if the same data was just accessed by another Service Tier instance. With memcached or Velocity the caching mechanism will intelligently combine unused RAM across all your servers into a single cache for all the Service Tier installations to share. That way only the first Service Tier instance to access a piece of data will need to use the database, and subsequent accesses by other Service Tier instances will pull the same data from cache.

Having this in place also answers the user session question, as you could easily use this same cache to store state values for user sessions and all of your Service Tier instances would have access to this same information.

Hope this helps!

Adam

Adam Alexander