views:

305

answers:

5

There are certain algorithms whose running time can decrease significantly when one divides up a task and gets each part done in parallel. One of these algorithms is merge sort, where a list is divided into infinitesimally smaller parts and then recombined in a sorted order. I decided to do an experiment to test whether or not I could I increase the speed of this sort by using multiple threads. I am running the following functions in Java on a Quad-Core Dell with Windows Vista.

One function (the control case) is simply recursive:

// x is an array of N elements in random order
public int[] mergeSort(int[] x) {
    if (x.length == 1) 
        return x;

    // Dividing the array in half
    int[] a = new int[x.length/2];
    int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)];
    for(int i = 0; i < x.length/2; i++) 
        a[i] = x[i];
    for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++) 
        b[i] = x[i+x.length/2];

    // Sending them off to continue being divided
    mergeSort(a);
    mergeSort(b);

    // Recombining the two arrays
    int ia = 0, ib = 0, i = 0;
    while(ia != a.length || ib != b.length) {
        if (ia == a.length) {
            x[i] = b[ib];
            ib++;
        }
        else if (ib == b.length) {
            x[i] = a[ia];
            ia++;
        }
        else if (a[ia] < b[ib]) {
            x[i] = a[ia];
            ia++;
        }
        else {
            x[i] = b[ib];
            ib++;
        }
        i++;
    }

    return x;
}

The other is in the 'run' function of a class that extends thread, and recursively creates two new threads each time it is called:

public class Merger extends Thread
{
    int[] x;
    boolean finished;

    public Merger(int[] x)
    {
        this.x = x;
    }

    public void run()
    {
        if (x.length == 1) {
            finished = true;
            return;
        }

        // Divide the array in half
        int[] a = new int[x.length/2];
        int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)];
        for(int i = 0; i < x.length/2; i++) 
            a[i] = x[i];
        for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++) 
            b[i] = x[i+x.length/2];

        // Begin two threads to continue to divide the array
        Merger ma = new Merger(a);
        ma.run();
        Merger mb = new Merger(b);
        mb.run();

        // Wait for the two other threads to finish 
        while(!ma.finished || !mb.finished) ;

        // Recombine the two arrays
        int ia = 0, ib = 0, i = 0;
        while(ia != a.length || ib != b.length) {
            if (ia == a.length) {
                x[i] = b[ib];
                ib++;
            }
            else if (ib == b.length) {
                x[i] = a[ia];
                ia++;
            }
            else if (a[ia] < b[ib]) {
                x[i] = a[ia];
                ia++;
            }
            else {
                x[i] = b[ib];
                ib++;
            }
            i++;
        }

        finished = true;
    }
}

It turns out that function that does not use multithreading actually runs faster. Why? Does the operating system and the java virtual machine not "communicate" effectively enough to place the different threads on different cores? Or am I missing something obvious?

+1  A: 

The overhead cost of synchronization may be comparatively large and prevent many optimizations.

Furthermore you are creating way too many threads.

The other is in the 'run' function of a class that extends thread, and recursively creates two new threads each time it is called.

You would be better off with a fixed number of threads, suggestively 4 on a quad core. This could be realized with a thread pool (tutorial) and the pattern would be "bag of tasks". But perhaps it would be better yet, to initially divide the task into four equally large tasks and do "single-threaded" sorting on those tasks. This would then utilize the caches a lot better.


Instead of having a "busy-loop" waiting for the threads to finish (stealing cpu-cycles) you should have a look at Thread.join().

aioobe
While this is generally the case with these problem, there is no synchronization in this example.
Peter Lawrey
Woops. In that case there should be some nasty race-conditions?
aioobe
Oh, I see the busy-waiting loop now.
aioobe
Yes this code _could_ suffer from memory consistency effects too, since `finished` isn't volatile (the busy wait loop might never complete). I think also the array elements need a joint to form a happens-before relationship.
Justin
Thanks for your suggestions and the tutorial, aioobe. I found that Thread.join() is actually much less affective than use of a CountDownLatch.
Robz
A: 

How many elements in the array you have to do sort? If there are too few elements, the time of sync and CPU switching will over the time you save for dividing the job for paralleling

vodkhang
N elements are in the array, and N is a very large number (greater than 1 million).
Robz
@Robz, I'm sure you'd be shocked to discover that the Sun implementation of Arrays.sort has a minimum threshold value for which it uses insertion sort. Size matters, period.
Tim Bender
Yeah, I know that:)
vodkhang
Actually I'm not sure that it would matter how arbitrarily large the array is for this potential "size matters" problem to occur. Merge sort, as well as quick sort, uses the divide-and-conquer method. Thus, if X is some cutoff and N is larger than X then eventually during the course of sorting the entire array a sub-array would be sent to the merge/quick sort function that is of length less than X.Interestingly, in my personal experience a simple sequential quick sort that does not use a cutoff nor insertion sort calls goes just as fast as Arrays.sort with N being 1 million or more.
Robz
+5  A: 

The problem is not multi-threading: I've written a correctly multi-threaded QuickSort in Java and it owns the default Java sort. I did this after witnessing a gigantic dataset being process and had only one core of a 16-cores machine working.

One of your issue (a huge one) is that you're busy looping:

 // Wait for the two other threads to finish 
 while(!ma.finished || !mb.finished) ;

This is a HUGE no-no: it is called busy looping and you're destroying the perfs.

(Another issue is that your code is not spawning any new threads, as it has already been pointed out to you)

You need to use other way to synchronize: an example would be to use a CountDownLatch.

Another thing: there's no need to spawn two new threads when you divide the workload: spawn only one new thread, and do the other half in the current thread.

Also, you probably don't want to create more threads than there are cores availables.

See my question here (asking for a good Open Source multithreaded mergesort/quicksort/whatever). The one I'm using is proprietary, I can't paste it.

http://stackoverflow.com/questions/2210185/correctly-multithreaded-quicksort-or-mergesort-algo-in-java

I haven't implemented Mergesort but QuickSort and I can tell you that there's no array copying going on.

What I do is this:

  • pick a pivot
  • exchange values as needed
  • have we reached the thread limit? (depending on the number of cores)
    • yes: sort first part in this thread
    • no: spawn a new thread
  • sort second part in current thread
  • wait for first part to finish if it's not done yet (using a CountDownLatch).

The code spawning a new thread and creating the CountDownLatch may look like this:

            final CountDownLatch cdl = new CountDownLatch( 1 );
            final Thread t = new Thread( new Runnable() {
                public void run() {
                    quicksort(a, i+1, r );
                    cdl.countDown();
                }
            } };

The advantage of using synchronization facilities like the CountDownLatch is that it is very efficient and that your not wasting time dealing with low-level Java synchronization idiosynchrasies.

In your case, the "split" may look like this (untested, it is just to give an idea):

if ( threads.getAndIncrement() < 4 ) {
    final CountDownLatch innerLatch = new CountDownLatch( 1 );
    final Thread t = new Merger( innerLatch, b );
    t.start();
    mergeSort( a );
    while ( innerLatch.getCount() > 0 ) {
        try {
            innerLatch.await( 1000, TimeUnit.SECONDS );
        } catch ( InterruptedException e ) {
            // Up to you to decide what to do here
        }
    }
} else {
    mergeSort( a );
    mergeSort( b );
}

(don't forget to "countdown" the latch when each merge is done)

Where you'd replace the number of threads (up to 4 here) by the number of available cores. You may use the following (once, say to initialize some static variable at the beginning of your program: the number of cores is unlikely to change [unless you're on a machine allowing CPU hotswapping like some Sun systems allows]):

Runtime.getRuntime().availableProcessors()
Webinator
+1 for busy-looping concept.
Bragboy
oops silly me, I should have rewritten it instead of using your code: in the case where you do not spawn a new thread, it is pointless so split into 'a' and 'b' and then do a mergeSort(a) and mergeSort(b)... Simply mergeSort directly the whole array, before splitting.
Webinator
Why on Earth would you put the call to CDL.await() in a while loop?Also, you're conditional (threads.getAndIncrement() < 4) would cause the 'count' of threads created to go up regardless of whether or not you spawn one. Likewise, you never really indicate when you reduce that count (though it could be assumed).
Tim Bender
@Tim Bender: because a broken 3rd party API may be wrongly interrupt()ing me awaiting. In that case you *may* or *may not* want to keep waiting, which is why I put a huge comment saying *"//Up to you to decide what to do here"* (you could decide to decrement the latch and exit the loop, if you think the interrupt is legit). The number of threads is fine when it is incremented: it's only purpose is to not spawn more than 'x' threads. Seen that I'll never "roll over" 2**32-1 threads this won't overflow and the code will work just fine. Nitpick or help the OP: choose one. ;)
Webinator
CountDownLatch is quite magical, and actually much better than thread.join(). I decided to go into quick sort as well. Being able to dynamically determine the number of processors is also a fantastic addition. I wouldn't have thought to start only one thread and then complete the other 'half' sequentially if you hadn't suggested it. So thanks, and have a belated bravo, webinator!
Robz
+3  A: 

As others said; This code isn't going to work because it starts no new threads. You need to call the start() method instead of the run() method to create new threads. It also has concurrency errors: the checks on the finished variable are not thread safe.

Concurrent programming can be pretty difficult if you do not understand the basics. You might read the book Java Concurrency in Practice by Brian Goetz. It explains the basics and explains constructs (such as Latch, etc) to ease building concurrent programs.

Julien Rentrop
A: 

Alright many months after the fact: I believe I have successfully completed this project, thanks to everyone here who contributed advice and suggestions!

I ended up conducting my work on a duel-core Macbook Pro. I achieved a decent percentage of speed increase (not quite the ideal 50%, but close to it) with both merge and quick sort algorithms when switching from their respective sequential functions to the threaded version. I also blew Arrays.sort(int[] a) 'out of the water' with a threaded quick sort method. I will post the code in another question (as this one is fairly old) and the link to it here shortly, asking if there is any other optimizations or rookie mistakes still there.

Thanks to you guys, I now really believe in parallel programming.

Robz
http://stackoverflow.com/questions/3425126/java-parallelizing-quick-sort-via-multi-threading
Robz