Follow up question to my pervious question: http://stackoverflow.com/questions/3579860/conditional-wait-with-pthreads
I changed my code to use semaphores instead of mutex locks and conditional signals. However, I seem to have run in to a condition that I cannot explain.
Here is the abstract
function thread work {
while (true)
sem_wait(new_work)
if (condition to exit){
exit
}
while (work based condition){
if (condition to exit)
exit
do work
if (condition to exit){
exit
}
sem_post(work_done)
set condition to ready
}
exit
}
function start_thread(){
sem_wait(work_done)
setup thread work
create work
sem_post(new_work)
return to main()
}
function end_thread(){
set condition to exit
sem_post(new_work)
pthread_join(thread)
clean up
}
explanation of the control flow: main thread calls start_thread to create a thread, hand over some work. main and worker continue in parallel. main may finish its work before worker or vice versa. If main finishes its work before worker, worker is no longer valid and must be told to abort what its doing. This is "condition to exit". This function (start_thread) does not create a thread every time its called, only the first time. Rest of the times it updates work for the thread.
The thread is reused and provided new work parameters to reduce the overhead of creating and destroying threads. Once the main decides that it no longer needs the worker thread, it calls the end_thread function. This function will tell the thread it is no longer needed, wait for it to exit and then cleans up the pointers, semaphores and work structure.
The thread will always wait for the semaphore (new_work) before starting its work. I am using sem new_work to signal the thread that new work is now available and it should start. The thread signals the control function (start_thread) that it has finished / aborted the work using the semaphore work_done.
Everything is working great except in some random circumstance. end_thread is waiting at pthread_join and the thread is waiting at sem_wait(new_work).
"condition to exit" is protected by a mutex.
I cant seem to figure out what is causing this condition.
Here is output from a trace
thread 1: sem NEW count, before wait : 0
thread 1: sem NEW count, before wait : 0
end post: sem NEW count, before post : 0
end post: sem NEW count, after post : 1
thread 1 exit.
thread exited, cleanup 1
Entered initialization for thread: 2
created a thread: 2
thread: 2 started.
.....
thread 2: sem NEW count, before wait : 0
thread 2: sem NEW count, before wait : 0
thread 2: sem NEW count, before wait : 0
end post: sem NEW count, before post : 0
thread 2 exit.
end post: sem NEW count, after post : 0
thread exited, cleanup 2
Entered initialization for thread: 3
created a thread: 3
thread: 3 started.
.....
thread 3: sem NEW count, before wait : 0
thread 3: sem NEW count, before wait : 0
end post: sem NEW count, before post : 0
end post: sem NEW count, after post : 1
thread 3: sem NEW count, before wait : 0
At this point, the thread is waiting at the semaphore and the exit_thread is waiting at pthread_join.
Thank you for your time.