ansaurus

Question

Can't get any speedup from parallelizing Quicksort using Pthreads

Answer 1

+5 A:

Starting a thread has a fair amount of overhead. You'd probably be better off creating a threadpool with some fixed number of threads, along with a thread-safe queue to queue up jobs for the threads to do. The threads wait for an item in the queue, process that item, then wait for another item. If you want to do things really correctly, this should be a priority queue, with the ordering based on the size of the partition (so you always sort the smallest partitions first, to help keep the queue size from getting excessive).

This at least reduces the overhead of starting the threads quite a bit -- but that still doesn't guarantee you'll get better performance than a single-threaded version. In particular, a quick-sort involves little enough work on the CPU itself that it's probably almost completely bound by the bandwidth to memory. Processing more than one partition at a time may hurt cache locality to the point that you lose speed in any case.

Jerry Coffin 2010-06-07 13:57:03

+1 - though I have suspicions about the cache locality issue. Basically, so long as each thread is handling a substantial partition, most accesses are sequential, so there should be little scope for bad cache handling. That said, 50,000 integers is a bit on the small side (fits easily in a CPU cache, far too small to fully exploit outer layers of cache) but the page size is probably the real issue, but I have little idea how big that is these days. Still, if your memory bandwidth is fully utilized, multithreading cannot help - and could mean more code wasting cache space.

Steve314 2010-06-07 14:09:21

+1 - This sounds great --- as if it's borne from years of experience.

Jacob 2010-06-07 14:24:52

@Jacob:well, I've been doing this stuff for a long time anyway -- I just hope it's really years of experience, not one year of experience repeated a whole bunch of times...

Jerry Coffin 2010-06-07 14:29:34

Answer 2

+1 A:

First guess would be that creating, destroying, and especially the syncing your threads is going to eat up and possible gain you might receive depending on just how many elements you are sorting. I'd actually guess that it would take quite a long while to make up the overhead and that it probably won't ever be made up.

Because of the way you have your sort, you have 1 thread waiting for another waiting for another... you aren't really getting all that much parallelism to begin with. You'd be better off using a more linear sort, perhaps something like a Radix, that splits the threads up with more further data. That's still having one thread wait for others a lot, but at least the threads get to do more work in the mean time. But still, I don't think threads are going to help too much even with this.

Michael Dorgan 2010-06-07 13:57:12

Answer 3

+1 A:

I just have a quick look at your code. And i got a remark. Why are you using lock. If I understand what you are doing is something like:

quickSort(array)
{
    left, right = partition(array);
    newThread(quickSort(left));
    newThread(quickSort(right));
}

You shouldn't need lock. Normally each call to quick sort do not access the other part of the array. So no sharing is involve.

mathk 2010-06-07 14:08:28

Maybe he's returning indices to an existing array instead of creating a new subarray.

Jacob 2010-06-07 14:23:09

The lock is for incrementing the global variable which keeps track of how many threads are used.

Murat Ayfer 2010-06-07 15:04:26

I bet it would be better to just add a recursion level argument to the algorithm and create a new thread if and only if the recursion level is not excessively deep. This way you aren't accompanying every recursive call by a lock...which at a lower level means all your threads are in contention for read access to your thread_count variable. I'd try 1 level of recursion initially, even though this only makes 2 threads rather than your maximum.

Brian 2010-06-07 21:48:50

Why having a global variable for that you can do:quickSort(array, maxThread){...newThread(quickSort(left, floor(maxThread / 2));newThread(quickSort(right, maxThread - floor(maxThread / 2))));..}

mathk 2010-06-08 06:59:08

Answer 4

A:

Unless each thread is running on a separate processor or core they will not truly run concurrently and the context switch time will be significant. The number of threads should be restricted to the number of available execution units, and even then you have to trust the OS will distribute them to separate processors/cores, which it may not do if they are also being used for other processes.

Also you should use a static thread pool rather than creating and destroying threads dynamically. Creating/destroying a thread includes allocating/releasing a stack from the heap, which is non-deterministic and potentially time-consuming.

Finally are the 16 processors on the server real or VMs? And are they exclusively allocated to your process?

Clifford 2010-06-07 21:43:10

ansaurus

tags:

views:

answers:

Can't get any speedup from parallelizing Quicksort using Pthreads

related questions