views:

167

answers:

3

I got asked this question once and still haven't been able to figure it out:

You have an array of N integers, where N is large, say, a billion. You want to calculate the median value of this array. Assume you have m+1 machines (m workers, one master) to distribute the job to. How would you go about doing this?

Since the median is a nonlinear operator, you can't just find the median in each machine and then take the median of those values.

+1  A: 

You could do a highly parallelizable sort (like merge sort) and get the median from the result.

glowcoder
A: 

Would sorting the array be overkill? If not, then divide up the array and then merge the results together is my suggestion.

JB King
+1  A: 

Depending on the Parallel Computation Model, algorithms could vary. (Note: the pdf linked to in previous sentence just contains some of the many possible ones).

Finding the median is a special case of finding the ith element. This problem is called 'selection problem', so you need to search the web for parallel selection.

Here is one paper (unfortunately, not free) which might be useful: Parallel Selection Algorithms With Analysis on Clusters.

And google's first link for the query "Parallel Selection" gives: http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node18.html which actually uses the median of medians for the general problem and not just median finding.

Moron

related questions