views:

3667

answers:

3

Hi,

One of the main examples that is used in demonstrating the power of MapReduce is the Terasort benchmark. I'm having trouble understanding the basics of the sorting algorithm used in the MapReduce environment.

To me sorting simply involves determining the relative position of an element in relationship to all other elements. So sorting involves comparing "everything" with "everything". Your average sorting algorithm (quick, bubble, ...) simply does this in a smart way.

In my mind splitting the dataset into many pieces means you can sort a single piece and then you still have to integrate these pieces into the 'complete' fully sorted dataset. Given the terabyte dataset distributed over thousands of systems I expect this to be a huge task.

So how is this really done? How does this MapReduce sorting algorithm work?

Thanks for helping me understand.

+3  A: 

Google Reference: MapReduce: Simplified Data Processing on Large Clusters

Appeared in:
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.

That link has a PDF and HTML-Slide reference.

There is also a Wikipedia page with description with implementation references.

Also criticism,

David DeWitt and Michael Stonebraker, pioneering experts in parallel databases and shared nothing architectures, have made some controversial assertions about the breadth of problems that MapReduce can be used for. They called its interface too low-level, and questioned whether it really represents the paradigm shift its proponents have claimed it is. They challenge the MapReduce proponents' claims of novelty, citing Teradata as an example of prior art that has existed for over two decades; they compared MapReduce programmers to Codasyl programmers, noting both are "writing in a low-level language performing low-level record manipulation". MapReduce's use of input files and lack of schema support prevents the performance improvements enabled by common database system features such as B-trees and hash partitioning, though projects such as PigLatin and Sawzall are starting to address these problems.

nik
I understand (most of) the concepts of MapReduce as described in the mentioned documents. I'm trying to understand the sorting algorithm.
Niels Basjes
A: 

Just guessing...

Given a huge set of data, you would partition the data into some chunks to be processed in parallel (perhaps by record number i.e. record 1 - 1000 = partition 1, and so on).

Assign / schedule each partition to a particular node in the cluster.

Each cluster node will further break (map) the partition into its own mini partition, perhaps by the key alphabetical order. So, in partition 1, get me all the things that starts with A and output it into mini partition A of x. Create a new A(x) if currently there is an A(x) already. Replace x with sequential number (perhaps this is the scheduler job to do so). I.e. Give me the next A(x) unique id.

Hand over (schedule) jobs completed by the mapper (previous step) to the "reduce" cluster nodes. Reduce node cluster will then further refine the sort of each A(x) parts which wil lonly happen when al lthe mapper tasks are done (Can't actually start sorting all the words starting w/ A when there are still possibility that there is still going to be another A mini partition in the making). Output the result in the final sorted partion (i.e. Sorted-A, Sorted-B, etc.)

Once done, combine the sorted partition into a single dataset again. At this point it is just a simple concatenation of n files (where n could be 26 if you are only doing A - Z), etc.

There might be intermediate steps in between... I'm not sure :). I.e. further map and reduce after the initial reduce step.

Jimmy Chandra
+7  A: 

Here are some details on Hadoop's implementation for Terasort. I found the paper reference through James Hamilton's Blog Post.

Yuval F
Yes, that explains it perfectly! Thank you. It states "TeraSort is a standard map/reduce sort, except for a custom partitioner that uses a sorted list of N − 1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i − 1] <= key < sample[i] are sent to reduce i. This guarantees that the output of reduce i are all less than the output of reduce i+1."So their trick is in the way they determine the keys during the map phase. Essentially they ensure that every value in a single reducer is guaranteed to be 'pre-sorted' against all other reducers.
Niels Basjes
The information was very helpful. I too was interested in sorting using MapReduce.
sul4bh