views:

541

answers:

4

How do large web sites which cannot be completely stateless achieve extreme scalability at the web tier?

There are sites like ebay and Amazon, which cannot be completely stateless, as they have a shopping cart or something like that. It isn't feasible to encode every item in the shopping cart into the URL, nor is it feasible to encode every item into a cookie and send it at every connection. So Amazon simply stores the session-id into the cookie which is being sent. So I understand that the scalability of the web tier of ebay and Amazon should be much harder than the scalability of the google search engine, where everything can be encoded restful into the URL.

On the other hand, both ebay as well as Amazon scaled absolutely massively. Rumor is that there are some 15000 J2EE Application Servers at ebay.

How do these sites handle both: extreme scalability and statefulness? As the site is stateful, it isn't feasible to do a simple DNS-Balancing. So one would assume that these companies have a hardware based load balancer like BigIP, Netscaler or something like that, which is the sole device behind the single IP adress of that site. This load balancer would decrypt the SSL (if encoded), inspect the cookie and decide depending on the session id of that cookie which application server holds the session of that customer.

But this just can't possibly work as no single load-balancer could possibly handle the load of thousands of apoplication servers? I would imagine that even these hardware load balancers do not scale to such a level.

Also, the load-balancing is being done transparantly for the user, i.e. the users are not forwarded to different addresses, but still all collectively stay at www.amazon.com the whole time.

So my question is: Is there some special trick with which one can achieve something like transparent sharding of the web tier (not the database tier as done commonly)? As long as the cookie isn't inspected there is no way to know which application server is holding this session.

Edit: I realized that there is only a need for transparency, if there is a need for the site to be spidered and bookmarked. E.g. if the site is a mere web app, something like a plane or train ticket reservation system, there should be no problem with just redirecting the users to specific clusters of web servers behind different urls, e.g. a17.ticketreservation.com. In this specific case, it would be feasible to just use multiple clusters of application servers, each behind his own load balancer. Interestingly, I did not find a site which uses this kind of concept. Edit: I found this concept discussed at highscalability.com, where the discussion refers to an article by Lei Zhu named "Client Side Load Balancing for Web 2.0 Applications". Lei Zhu uses cross scripting to do this client side load balancing transparently.

Even if there are drawbacks, like bookmarking, xss, etc, I do think that this sounds like a extremely good idea for certain special situations, namely almost content-free web applications, which are not needed to be spidered or bookmarked (e.g. ticket reservation systems or something like that). Then there is no need to do the load balancing transparently.

There could be a simple redirect from the main site to the server, e.g. a redirect from www.ticketreservation.com to a17.ticketreservation.com. From there on the user stays at the server a17. a17 is not a server, but a cluster itself, by which redundancy could be achieved.

The initial redirect server could itself be a cluster behind a load balancer. This way, a really high scalability could be achieved, as the primary load balancer behind www is only hit once at the beginning of each session.

Of course, the redirect to different urls looks extremely nasty, but with mere web applications (which do not need to be spidered, deep-linked or deep-bookmarked anyway), this should be only an optical problem for the user?

The redirect-cluster could poll the load of the application clusters and adapt the redirects accordingly, thus achieving balancing and not mere load distribution.

+2  A: 

You would probably have to be on the engineering team at one of these places to know for sure but there are people who have made educated guesses from talks and other information that has come out of both places:

Ebay Architecture and Amazon Architecture

Just a single load balancer by itself in today's world is kind of the equivalent of DNS round robin of years past. Today you have things like anycast that let you play all kinds of tricks. You can be pretty sure that the likes of ebay and amazon do use load balancers and they use a lot of them.

You may want to boil it down a little more when you think about how it might work because a lot of the traffic is stateless. In a single request for a page there are potentially a lot of objects that don't need to know about the state. Take those objects out of the picture by serving them from a stateless system (this is where the anycast comes in) and the number of requests goes down dramatically.

If that doesn't get you to the point that a single load balancer can handle the load then the next step up is to break the transactions up using IP routing and/or geo-DNS. Sites as large as ebay and amazon will be in a number of different datacenters with a large number of internet connections at each. You take everything coming in from internet pop quest-west and send it to the west coast datacenter "quest" servers, anything from att-west gets sent to the west coast datacenter "att" servers, anything from quest-east and it goes to the east coast datacenter "quest" servers, etc. Each of those systems could be an island a single load balancer that could handle the load, some of the load balancers out there can handle hundreds of thousands of transactions a second even SSL encrypted. On the backside you replicate in bulk to each datacenter constantly but it can be out of sync.

carson
Yes, I did read both articles at highscalability.com. I posted this question as I weren't able to find anything about the loadbalancing there.Anycast is surely much more advanced than round robin, but also does not provide stateful load balancing, as I understand it.
SAL9000
+2  A: 

You may find useful the following paper, which presents the design and implementation of a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience:

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazon's Highly Available Key-Value Store”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.

Panos
+1  A: 

Easy. The web servers, which are stateless, are load balanced. The application servers (middle tier), which hold the session data, are not. The web servers can use your session id cookie to determine what app server to contact.

Memcached and Microsoft's Velocity are products that solve this exact need.

Edit: How does a web server know which app server to contact? This is embedded into the session id hash, and could generically be done however you like. It could be as simple as your session id being server:guid. Memcached bases it off the hash, though.

The important bit is that the client has to be able to figure out what app server to contact in a stateless fashion. The easiest way to do that is to embed it into the key, though a registry (perhaps on it's own tier) would work as well and could provide some fault tolerance.

Edit2: Going back over some Ebay interviews, I may have gotten the particulars of their implementation a bit wrong. They don't do caching, and they don't do state in the middle tier. What they do, is to have a load balanced middle tier (app servers) partitioned by function. So, they would have a pool of servers for, eg., viewing items. And then another pool for selling items.

Those app servers have a "smart" DAL that routes to sharded databases (partitioned both by function and data, so Users A-L on Database1, Users M-Z on Database2, Items 1-10000 on Items1, etc.).

They don't have state in the middle tier because they're partitioned by function. So, a normal user experience would involve more than 1 pool of app servers. Say you view an item (ViewAppServerPool), then go to bid on an item (BidAppServerPool). All of those app servers would have to stay in sync, which then requires a distributed cache to manage everything. But, their scale is so large that no distributed cache could effectively manage it, nor could a single database server. This means they have to shard the data tier, and any cache implementation would have to be split across the same boundaries.

This is similar to what I posted above, just moved down a layer. Instead of having the web server determine which app server to contact, the app server determines which database to contact. Only, in Ebay's case, it could actually be hitting 20+ database servers because of their partition strategy. But, again, the stateless tier has some sort of rule(s) that it uses to contact the stateful tier. Ebay's rules, however, are a bit more complicated than the simplistic "User1 is on Server10" rule I was explaining above.

Mark Brackett
How do the stateless web servers find the correct app server? Does every web server have to know about each session any app server holds? Wouldn't be this horrific communication overhead?
SAL9000
The load balancers use your session id or possible your IP address as an input to choose the app server. If every load balancer has the same algorithm to choose the app server it should not matter over which loadbalancer you go, you will always be sent to the same app server. No communication between app server and load balancer involved.
pi
+2  A: 

I don't know how they do it, but here are some suggestions:

  • To avoid overloading a load-balancer host itself, use round-robin DNS OR
  • Redirect different clients to different cluster addresses based on load, settings, geolocation, etc

To distribute middle tier load,

  • Embed the ID of the middle tier session server inside the session ID cookie - as others have suggested. That way which front-end box you hit is irrelevant, they can be added/removed without any impact.
  • If it's sufficiently important, have a mechanism of redirecting clients to an alternative middle tier server during a session so one can be taken down for maintenance etc.
  • Clients start using a newly commissioned middle tier server as they start a new session

To distribute back end database load

  • "Conventional" sharding of "real time" per-account or per-user data
  • Asynchronously replicate slowly-changing or relatively static data; users could see it out of date (but not much most of the time). Middle tier and web servers connect to a database local to their own location
MarkR