views:

501

answers:

4

I have been tinkering with BSP trees for a while now and am also playing with threads. When adding a triangle to a BSP tree, an opportunity arises to create a new thread for the purposes of processing data in parallel.

insert(triangle, bspnode)
{
  ....
  else if(triangle spans bspnode)
  {
    (frontpiece, backpiece) = plane_split(triangle, bspnode)

    insert(frontpiece, bspnode.front)
    insert(backpiece, bspnode.back)
  }
  ....
}

The two insert operations above could be executed by two threads, and since they do not modify the same data, cheap synchronization can be used.

insert(triangle, bspnode)
{
  ....
  else if(triangle spans bspnode)
  {
    (frontpiece, backpiece) = split(triangle, bspnode)

    handle = beginthread(insert(backpiece, bspnode.front))
    insert(frontpiece, bspnode.back)
    if(handle)
    {
      waitforthread(handle)
    }
    else
    {
      insert(backpiece, bspnode.front)
    }
  }
  ....
}

This new method attempts to create a thread to complete the operation in parallel, but should not fail if the thread cannot be created (it will simply revert to the original algorithm).

Is this a sound programming practice, or am I using threads improperly? I have not been able to find any literature on this technique. I like that it tends to use my CPU to its fullest (2 cores), and would theoretically scale to any number of processors available. I don't like that it might be horribly wasteful on CPU and memory.

+4  A: 

Threads are great if some part of the processing is waiting on something external (user input, I/O, some other processing) - the thread that's waiting can continue to wait, while a thread that isn't waiting forges on ahead.

However, for processing-intensive tasks, more threads than processors actually creates overhead. It seems like your threads are doing all "CPU work", so I'd stick to one thread per core - test to find the optimal number, though.

The biggest overhead created is from context switching (freezing one thread and loading the execution context of the next one), as well as cache misses when threads are doing tasks with different memory (if your thread can use the CPU cache effectively).

Philip Rieck
Oh and he RHYMES! :) HAHAH NICE!
Lirik
+2  A: 

your best bet would be to create a threadpool, and then use it 'transparently' to add nodes.

eg, create 2 threads at program start, have them wait on a semaphore or event. When you have nodes to add, you pop the data onto a queue then trigger the semaphore. This wakes one of the threads which pops the data off the queue and performs the processing. (make sure access to the queue is threadsafe - fully synchronised with a critical section is best).

The overall performance of your app is slower as you have more overhead, in copying data to the queue and running the extra threads, but if you used to run on a single core you will now be running on 2. It works best if the threaded processing is expensive.

gbjbaanb
A: 

Sure, for example, Quicksort can be programmed multithreaded quite easily and get some large performance gains on multi-core systems, and some small performance losses on non-multithreaded. Just remember that you're adding overhead twice now - once for the stack save on the recursion and once on the thread, so if you're doing a large number of recursions then it could overwhelm a system faster than a non-multithreaded approach.

tloach
A: 

Hi,

I am trying to do a similar thing with Quicksort and I do see some speedup with 2 threads. Now I want to see the gain with 4 threads over a multicore, and I am kinda stuck with how to implement this for weeks now..:( The problem is, I want to spawn threads outside recursion in order to avoid overheads, and once its spawned to 4 threads I should be able to do a normal sequential sort from there on..How do I get to do this outside recursion?

My Qsort would look smthing like this with 2 threads:

Qsort()
{
Qsortobj.partition();
Qsortobj.leftsort();
pthread(&thread1, NULL, &QsortImpl::thread_fun, new thread_fun_args(this,&Qsortobj) );
}

The 'thread_fun' function redirects the thread to 'rightsort()' and both the left and right sorts recursively call Qsort_sequential().

How could I implement 4 threads to do this without putting them inside recursion?

Aki

Aki