How do large web sites which cannot be completely stateless achieve extreme scalability at the web tier?
There are sites like ebay and Amazon, which cannot be completely stateless, as they have a shopping cart or something like that. It isn't feasible to encode every item in the shopping cart into the URL, nor is it feasible to encode every item into a cookie and send it at every connection. So Amazon simply stores the session-id into the cookie which is being sent. So I understand that the scalability of the web tier of ebay and Amazon should be much harder than the scalability of the google search engine, where everything can be encoded restful into the URL.
On the other hand, both ebay as well as Amazon scaled absolutely massively. Rumor is that there are some 15000 J2EE Application Servers at ebay.
How do these sites handle both: extreme scalability and statefulness? As the site is stateful, it isn't feasible to do a simple DNS-Balancing. So one would assume that these companies have a hardware based load balancer like BigIP, Netscaler or something like that, which is the sole device behind the single IP adress of that site. This load balancer would decrypt the SSL (if encoded), inspect the cookie and decide depending on the session id of that cookie which application server holds the session of that customer.
But this just can't possibly work as no single load-balancer could possibly handle the load of thousands of apoplication servers? I would imagine that even these hardware load balancers do not scale to such a level.
Also, the load-balancing is being done transparantly for the user, i.e. the users are not forwarded to different addresses, but still all collectively stay at www.amazon.com the whole time.
So my question is: Is there some special trick with which one can achieve something like transparent sharding of the web tier (not the database tier as done commonly)? As long as the cookie isn't inspected there is no way to know which application server is holding this session.
Edit: I realized that there is only a need for transparency, if there is a need for the site to be spidered and bookmarked. E.g. if the site is a mere web app, something like a plane or train ticket reservation system, there should be no problem with just redirecting the users to specific clusters of web servers behind different urls, e.g. a17.ticketreservation.com. In this specific case, it would be feasible to just use multiple clusters of application servers, each behind his own load balancer. Interestingly, I did not find a site which uses this kind of concept. Edit: I found this concept discussed at highscalability.com, where the discussion refers to an article by Lei Zhu named "Client Side Load Balancing for Web 2.0 Applications". Lei Zhu uses cross scripting to do this client side load balancing transparently.
Even if there are drawbacks, like bookmarking, xss, etc, I do think that this sounds like a extremely good idea for certain special situations, namely almost content-free web applications, which are not needed to be spidered or bookmarked (e.g. ticket reservation systems or something like that). Then there is no need to do the load balancing transparently.
There could be a simple redirect from the main site to the server, e.g. a redirect from www.ticketreservation.com to a17.ticketreservation.com. From there on the user stays at the server a17. a17 is not a server, but a cluster itself, by which redundancy could be achieved.
The initial redirect server could itself be a cluster behind a load balancer. This way, a really high scalability could be achieved, as the primary load balancer behind www is only hit once at the beginning of each session.
Of course, the redirect to different urls looks extremely nasty, but with mere web applications (which do not need to be spidered, deep-linked or deep-bookmarked anyway), this should be only an optical problem for the user?
The redirect-cluster could poll the load of the application clusters and adapt the redirects accordingly, thus achieving balancing and not mere load distribution.