Well, I can't give you a recommendation on books, but I can give you a few pointers from my own personal experience (Our team runs an e-commerce/Customer Portal site that gets 2 million+ unique visitors per hour).
Some Notes:
- At this kind of load, we have to scale out. We currently run small cluster (13? I think) behind redundant F5 BigIP Load Balancers(http://www.f5.com/products/big-ip/). Hardware Load Balancers are great!
- Because we are in a farm, the default ASP.NET Session provider is worthless, however because Session is pluggable, we were able to write out own session provider without changing any application code. This allows us to run a dedicated state server.
- We heavily leverage robust (kind of) middle tier solutions, we use SAP/R3 for purchase and inventory management, and Tibco for the business tier for dealing with pre-existing customers. These applications run on their own farms, and for the most part communicate with us via SOAP.
- We currently use two boxes dedicated to running SQL Server, this hosts just the catalog database, not the customer database, that database sits on a 32 machine farm.
- The SQL Server databases are definitly a pinch point, and can easily overload and die during peak hours. Because of this we cache heavily on the application level. Our own entity framework will cache all stored procedure calls and avoids executing a call if there is one in the cache.
- This Cache is built out on each application server independantly, as having a single cache server would just be a point of failure.
- If we were to invalidate all of the caches at once, it takes about 15 minutes under our current traffic load to rebuild it completely just from customer usage. However, this spikes our SQL Servers to 100% cpu.
- When we invalidate our cache, we do it one app server at a time, rolling each through during non peak hours.
That said, there is lots of room for improvement. We can further optimize our databases (They are nearly 100% read, so cached denormalized views are good option, so is additional clustered indexes).
We also have been working with our content team to lighten the actual page sizes. Some of our pages can be up to 500k, which is horribly in my opinion. We are also investigating using a CDN to serve all static content to free up our servers to handle just dynamic content.
Anyways, This just gives you a little window in the life that is large scale website programming, and hopefully I didn't scare you :D