I am particularly interested how one can deal with a huge amount of information for a commercial service like Google Search or Google Maps. We all know they use (or "did" at least) a kind of Linux clusters, but how exactly are they organized? What kind of hardware do they use, what file systems, networking, what problems are the most frequent?
The answer depends on what you are trying to do. Google created their own destributed database, but also they created their own computing farms and lot of other stuff. Once you decided to go same way, there is no simple aswer what to do, but for sure you will need millions of investments into infrastructure and development. Matthew provided in comment link to matherials about what Google did.
However, if your goal is to create web application, than you may want not to spend time for creating infrasturucture yourself, but use what is already on the market. if you want to create application that can deal with huge amount of data serve millions of customers every hour, then you definetely should look at cloud infrastructure like Amazon Elastic Computing Cloud and Microsoft Azure.
Advantage is that you get at reasonable price huge computing farm with implemented database application and immediate scalability, and no IT costs associated with it. You can immediately scale it from just one server to hundred and then back to one, when demand spike was passed.
Here's description of Amazon Elastic Computing Cloud: http://aws.amazon.com/ec2/
Here's description of their scalable database services: http://aws.amazon.com/simpledb/ http://aws.amazon.com/rds/