After finishing school in Computer Science and entering the "real world" as a software engineer working on web applications, I've become overwhelmed by the amount of information to be learned about scaling web applications properly. Some topics/questions that have recently popped up for me:
- RDBMS's vs. unstructured data storage.
- Advantages and disadvantages of sharding for databases, search indexes, etc.
- Which network file systems scale? Which don't?
- The cost of HTTP and DB connections.
- Static content delivery, why not to store images in a database, etc.
- Why is it better to use one thread pool vs. creating new ones all the time?
- More on memcached and alternatives.
- Common CPU bound operations vs. IO-bound operations.
- Skinny tables...
- Better understanding of cookies...
- WSDL's, REST, SOAP...
- ORM, Hibernate...
- A billion other buzz-words...
I'm looking for a book, or a small set of books, that'll cover a wide array of topics relevant to building scalable web applications, including topics that aren't specific to web applications. While I can easily look up specific information on each of the above topics, I'm looking for books that'll (a) bring up more related topics/questions that I have yet to come across, and (b) tie topics together as much as possible.
It seems that some key categories for me are:
- DB performance and tuning
- Scalability of networked servers/filesystems/communication
- General performance and concurrency topics
- General web topics (e.g. cookies)
(Although this is not a comprehensive list, and you may be able to think of more important categories for someone in my situation.)
I'd also like to focus more on fundamentals than the nitty-gritty of the latest and greatest technologies. I think it's important that I establish my engineering fundamentals before I dive deep some random new technology.
So, back to the question: are there any books that you would recommend for someone in my situation? Any other methods for quickly building a breadth of knowledge?
Thanks!