views:

759

answers:

3

I'm doing a simple ray tracer in C++ using SDL for graphics and pthread for threading. And I have a problem making my program utilizing two cores, the threads work, they just don't drive both cores to 100%. To interface SDL I write directly to it's memory, SDL_Surface.pixels, so I assume that it can't be SDL locking me.

My thread function looks like this:

void* renderLines(void* pArg){
while(true){
 //Synchronize
 pthread_mutex_lock(&frame_mutex);
 pthread_cond_wait(&frame_cond, &frame_mutex);
 pthread_mutex_unlock(&frame_mutex);

 renderLinesArgs* arg = (renderLinesArgs*)pArg;
 for(int y = arg->y1; y < arg->y2; y++){
  for(int x = 0; x < arg->width; x++){
   Color C = arg->scene->renderPixel(x, y);
   putPixel(arg->screen, x, y, C);
  }
 }

 sem_post(&frame_rendered);
    }
}

Note: scene->renderPixel is const, so I assume both threads can read from the same memory. I have two worker threads doing this, in my main loop I make these work using:

//Signal a new frame
pthread_mutex_lock(&frame_mutex);
pthread_cond_broadcast(&frame_cond);
pthread_mutex_unlock(&frame_mutex);

//Wait for workers to be done
sem_wait(&frame_rendered);
sem_wait(&frame_rendered);

//Unlock SDL surface and flip it...

Note: I've also tried creating and joining the threads instead of synchronizing them. I compile this with "-lpthread -D_POSIX_PTHREAD_SEMANTICS -pthread" and gcc does not complain.

My problem is best illustrated using a graph of the CPU usage during execution:

As can be seen from the graph my program only uses one core at a time, then switching between the two every once in a while, but it doesn't drive both to 100% ever. What in the world have I done wrong? I'm not using any mutex or semaphors in scene. What can I do to find the bug?

Also if I put while(true) around scene->renderPixel() I can push both cores to 100%. So I've suspected that this is caused by overhead, but I only synchronize every 0.5 second (e.g. FPS: 0.5), given a complex scene. I realize it might not be easy to tell me what my bug is, but an approach to debugging this would be great too... I haven't played with pthreads before...

Also, can this be a hardware or kernel issue, my kernel is:

$uname -a
Linux jopsen-laptop 2.6.27-14-generic #1 SMP Fri Mar 13 18:00:20 UTC 2009 i686 GNU/Linux

Note:

+1  A: 

I'd take a wild stab in the dark and say your worker threads are spending lots of time waiting on the condition variable. To get good CPU performance in this kind of situation where your code is mostly CPU bound, it is understood to use a task oriented style of programming, where you treat the threads as a "pool" and you use a queue structure to feed work to them. They should spend a very small amount of time pulling work off the queue and most of their time doing the actual work.

What you have right now is a situation where they are probably doing work for a while, then notifying the main thread via the semaphore that they are done. The main thread will not release them until both threads have finished working on the frame they are currently processing.

Since you are using C++, have you considered using Boost.Threads? It makes working with multithreaded code much easier, and the API is actually kind of similar to pthreads, but in a "modern C++" kind of way.

1800 INFORMATION
+1  A: 

I'm no pthreads guru, but it seems to me that the following code is wrong:

pthread_mutex_lock(&frame_mutex);
pthread_cond_wait(&frame_cond, &frame_mutex);
pthread_mutex_unlock(&frame_mutex);

To quote this article

pthread_cond_wait() blocks the calling thread until the specified condition is signalled. This routine should be called while mutex is locked, and it will automatically release the mutex while it waits. After signal is received and thread is awakened, mutex will be automatically locked for use by the thread. The programmer is then responsible for unlocking mutex when the thread is finished with it.

so it seems to me that you should be releasing the mutex after the block of code follwing the pthread_cond_wait.

anon
+2  A: 

This is useless :

pthread_mutex_lock(&frame_mutex);
pthread_cond_wait(&frame_cond, &frame_mutex);
pthread_mutex_unlock(&frame_mutex);

if you wait to wait for a new frame do something like :

int new_frame = 0;

First thread :

pthread_mutex_lock(&mutex); 
new_frame = 1; 
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);

other thread :

pthread_mutex_lock(&mutex); 
while(new_frame == 0)
  pthread_cond_wait(&cond, &mutex); 
/* Here new_frame != 0, do things with the frame*/
pthread_mutex_unlock(&mutex);

pthread_cond_wait(), actually release the mutex, and unschedule the thread until the condition is signaled. When the condition is signaled the thread is waken up and the mutex is re-taken. All this happen inside the pthread_cond_wait() function

Ben
This did help, also I discovered that rendering every second line instead of half the image made the two threads render in almost the same time... So I did eventually manage to drive both core to 100%, but it didn't improve my frame rate :) - Or I'm just measuring it wrong... Thanks for the help...
jopsen
Haha, the first "optimization" step is always try to make the parallel algorithm as efficient with n processors, than the sequential was with a single processor. Keep trying, you will eventually get an improvement
Ben