views:

122

answers:

3

I have to use N(i=1..N) threads to sort an array of M numbers,each thread start to sort the array from position (N*i)%m. Could anyone help me?

+3  A: 

What you will want to do is use a divide and conquer sorting method like quick sort.

What you will want to do is partition the array, and then pass the two halves of the array off to another thread to do the processing.

Say you have the number:

11 43 24 56 12 65 90 12 53 23

In one thread, you will partition the numbers:

12 24 11 23 12 | 65 90 53 56 43

Then you can perform quick sort on each half of the array in a different thread.

Allow me to provide some code:

public void multiThreadSort(int threads, int[] arr, int start, int stop) {
    if (threads > 1) {
        int midpoint = partition(arr, start, stop);
        new Thread(){public void run() {
              multiThreadSort(threads - 1, arr, start, midpoint);
        }}.start();
        new Thread(){public void run() {
              multiThreadSort(threads - 1, arr, midpoint, stop);
        }}.start();
    }
    else 
        Arrays.sort(arr, start, stop);
}

public int partition(int[] arr, int start, int stop);

Then call it like so:

multiThreadSort(N, arr, 0, arr.length());
jjnguy
I like that this answer actually answers the question, not try and tell the asker that its a bad idea (which for small cases, it is).
Shynthriir
@Shy, yeah, in small cases it's a bad idea. But for large datasets and educational purposes, it's a good idea.
jjnguy
Hmm.. What do you do when you have no more threads to "delegate" to? Do you start sorting the remaining chunks with a single-threaded algorithm?
aioobe
@aioobe, yup. You just sort the remaining chunks on that thread. And, because everything is already partitioned, you don't have to merge.
jjnguy
While this would indeed do a threaded sort of the array, it neither limits the sort to N threads, nor have threads start to sort at position (N*i)%m. The former could be fixed by utilize a fixedThreadPool(N), the latter cannot be fixed using quicksort.
Luke Hutteman
To use the specified partitioning, you would need to use mergesort instead, starting N threads to sort their individual partitions, and then merging the results of these sorts.
Luke Hutteman
Ah, ok, I see. Thats nice. But it utilizes the threads poorly during the partitioning phase I suppose? Since only one thread would be working... but perhaps it pays off in the end.
aioobe
@Luke Hutteman, see my answer ;-)
aioobe
@Luke, on an input of size N^2 you can keep spawning threads, until you have created N seperate threads, and then you will just do a single threaded sort. This will end up making each thread sort starting from some position `(N*i)%m`.
jjnguy
@jinguy yes, but this is a very specific case as far as the input-size is concerned. Also, partitioning in quicksort does not typically happen at the exact mid-point (which causes quicksort to have a worst-case performance of O(N^2))
Luke Hutteman
@Luke, you have convinced me that my answer is perhaps not the perfect solution the the OPs problem. But, I will hold that is is the best sorting approach for multiple threads.
jjnguy
agreed; quicksort in general is faster than mergesort, and also has the additional advantage that it can be done in-place. While it does not fit the OP's problem perfectly, it would certainly be the approach I would take as well to thread a sorting algorithm.
Luke Hutteman
+1  A: 

You could let each thread sort its own portion of the array using Arrays.sort(arr, fromIndex, toIndex) and after all threads are done, simply use a merge sort to merge the different results. (Even this step could be parallelized by merging multiple portions at a time.)

Here is a related question / good answer.

aioobe
If you are multi-threading for performance, performing full sorts on each part, and then merging them isn't the best option.
jjnguy
hmm, no? care to elaborate?
aioobe
Well, there is a better option than fully sorting and then merging. A distributed quick sort will perform better. (on large data sets)
jjnguy
I see how this might better answer the specific question.
jjnguy
But, if each thread can only look at one small part of the array, it is impossible to properly sort the entire thing.
jjnguy
@jinguy: mergesort uses a divide-and-conquer approach just like quicksort; the difference is that were quicksort uses an O(N) step to partition the array prior to spawning the left- and right- sorts, mergesort sorts both first and then uses an O(N) step to merge the sorted arrays.
Luke Hutteman
@Luke, I understand that. What I'm not sure about in general is: How is the array going to be fully sorted if each thread can only look at a part of the array?
jjnguy
A: 

Another question is why do you want to do this? Thread creation and destruction is rather heavy, sorting (if you can keep your set in-memory) is rather quick. Doing this make sense only if you have the threads pre-existing in a thread pool for some other reason, if the time to sort M is much greater than the time to create (and probably to destroy) a thread, or if this is an academic exercise to learn about thread manipulation (in which case you shouldn't be asking here). If the time to sort M is greater than the time to create a thread you are probably going to have part of your array in virtual memory and disk paging effects will dominate your performance.

Threads are very useful, even essential, for some algorithms. Sorting is not a good application. Except, as mentioned, as a learning exercise to get experience writing software where your threads play well together.

verisimilidude
Giant arrays could benefit from being sorted in multiple threads.
jjnguy
"Thread creation and destruction is rather heavy" you have a reference on that? I believe it is synchronization between threads that is expensive.
aioobe
@aioobe, if you don't need to synchronize the threads, this isn't a problem.
jjnguy
Yep. That's what I thought.
aioobe
Creating a thread requires creation of a Thread object (which requires memory allocation and cycles to check its liveness), creation of an underlying OS thread (impact varies by OS), and adding the thread into the scheduler. None of these processor cycles are required when using a reasonable data structure to implement an in-process quicksort. And @jjnguy, I did say that large arrays could benefit when the time to sort is larger than the time to create a thread. Until you come up with some actual benchmark data I will stand by my assertion that multithreading and sorting don't go together.
verisimilidude