Can someone explain what is hadoop in terms of the ideas behind the software ? What makes it so popular and/or powerful ?
Hadoop implements Google's MapReduce algorithm, to understand it better you must read Google's MapReduce paper over at http://labs.google.com/papers/mapreduce.html
Hadoop is a programming environment which enables running massive computations in parallel on a large cluster of machines. It is resilient to loss of several machines, scalable to enable faster computations by adding machines and trackable to report the computation status. Hadoop is popular because it is a strong open source environment and because many users, including large ones such as Yahoo!, Microsoft and Facebook, employ it for large data-crunching projects. It is powerful because it uses the map/reduce algorithm, which decomposes a computation into a sequence of two simple operations:
- map - Take a list of items and perform the same simple operation on each of them. For example, take the text of a web page, tokenize it and replace every token with the string :1
- reduce - Take a list of items and accumulate it using an accumulation operator. For example, take the list of :1, count the occurence of and output a list of the form :nt, where nt is the number of times appeared in the original list.
Using proper decomposition (Which the programmer does) and task distribution and monitoring (which Hadoop does) you get a fast scalable computation; In our example - a word-counting computation. You can sequence tens of maps and reduces and get implementations of sophisticated algorithms. This is the very high level view. Now go read about MapReduce and Hadoop in further detail.