views:

133

answers:

6

Hello buddies!

I wanna implement a fast algorithm for a homework, but using parallel processing for this task. I heard that the parallel version of Quicksort is the best choice, but I'm not sure of this... maybe Heapsort is a good idea. Which algorithm do you think is the best one for a parallelized environment, and why?

Regards.

+3  A: 

Merge sort is a great first parallel sorting technique. The best sort is always machine dependent and generally involves a combination of sorting techniques for different size inputs.

NickLarsen
A: 

quick sort is recursive, a simple way to make any recursive algorithm parallel (only if it involves two or more recursive calls, as quicksort does), is to spawn two new threads for the recursive calls, and wait until they are done, then finish your function. this is by no means optimal, but it is a fairly quick and dirty way of parallelizing recursive calls.

aepurniet
A: 

I actually worked on a parallel sorting algorithm for a parallelization library a while back and came to the conclusion that it's not worth doing. For small datasets the cost of even a few synchronization primitives makes the parallel sort slower than a regular sort. For large datasets, you're mostly bound by shared memory bandwidth and you get minimal speedups. For the case of sorting a large number (I think 10 million) integers, I was only able to get <1.5x speedup on a dual core using a parallel quick sort IIRC.

Edit:

Most of the programming I do is number crunching, so I tend to think in terms of sorting simple primitives. I still think a parallel sort is a bad idea for these cases. If you're sorting things that are expensive to compare, though, this answer doesn't apply.

dsimcha
Quick sort doesn't work in parallel very well. Merge sort works in parallel much, much better.
Dean J
I think I tried that too, though I don't remember the details except that nothing I tried worked particularly well.
dsimcha
Your problem was that you were trying to parallelize something that was already fast. Sorting integers is trivial because the comparisons get lost in the noise. Try parallelizing a sort of 50k items where each comparison takes 1ms, then tell me it's not worth it.
Gabe
@Gabe: You may have an interesting point. I can't think of any object I've ever worked with where the comparison is that expensive, but if the comparison did take this long and wasn't memory bandwidth-bound, then you're right, a parallel sort would probably work quite well.
dsimcha
Comparing anything for sort that isn't an atomic operation for a given processor can gain significant speedup for large data sets in parallel. Even something as small as a 128 bit value.
NickLarsen
@Nick: I think the main point here is that the divisions (using a divide and conquer algorithm) need to be expensive enough to not be overwhelmed by the synchronization overhead **and** the algorithm needs to not be memory bandwidth bound.
dsimcha
@dsimcha: I understand that, hence my answer from earlier today where I noted that sorting algorithm optimizations are typically machine based. I was merely expanding on Gabe's comment.
NickLarsen
+4  A: 

Quick sort can split the unsorted list into two halves, but unfortunately, the halves aren't guaranteed to be anywhere near even. So one machine (or half of a cluster of machines) could get 20 entries, and the other half could get 20 billion.

I can't think of a good way to make heapsort work in parallel. It can be done, but man, that feels really counterintuitive.

Merge sort is the one I think you want.

  • Each split is exactly 50% of the list, so it's easy to split between processors.
  • You can implement merge sort on two sets of tape drives, which means that it doesn't require the whole list be in memory at one time. For large lists, especially those that are larger than the memory you have available, that's a must have.
  • Merge sort is also stable in parallel implementations, if it matters.
Dean J
+1  A: 

How about thinking about this in two steps.

Step 1. Break my data down into N chunks, where N is my number of processors/nodes/cores. Sort each chunk.

Step 2. Combine my N chunks together.

For sorting the N chunks, you can use whatever you want, based on your data. Quicksort, heapsort, I don't care. For step 2, Merge sort handles combining two sorted lists really well, so that is probably your best bet.

Peter Recore
+1  A: 

As Dean J mentions, merge sort is a good candidate. But it has the disadvantage of requiring a synchronization when both the threads are done (the merging process).

Though quicksort has the disadvantage of being unpredictable while partitioning, what can be done is to make the first partition (that decides the processor load) consciously to divide the load more or less evenly, and then letting the algorithm take its course.

The advantage is that you don't need to do any sync of any kind after the processors are done with their work. After they are done, you have the sorted array ready, without the need for an extra merging step, whic might be costly.

Lazer