views:

1754

answers:

2

I'm learning pthread and wait conditions. As far as I can tell a typical waiting thread is like this:

pthread_mutex_lock(&m);
while(!condition)
     pthread_cond_wait(&cond, &m);
// Thread stuff here
pthread_mutex_unlock(&m);

What I can't understand is why the line while(!condition) is necessary even if I use pthread_cond_signal() to wake up the thread.

I can understand that if I use pthread_cond_broadcast() I need to test condition, because I wake up all waiting threads and one of them can make the condition false again before unlocking the mutex (and thus transferring execution to another waked up thread which should not execute at that point). But if I use pthread_cond_signal() I wake up just one thread so the condition must be true. So the code could look like this:

pthread_mutex_lock(&m);
pthread_cond_wait(&cond, &m);
// Thread stuff here
pthread_mutex_unlock(&m);

I read something about spurious signals that may happen. Is this (and only this) the reason? Why should I have spurious singnals? Or there is something else I don't get?

I assume the signal code is like this:

pthread_mutex_lock(&m);
condition = true;
pthread_cond_signal(&cond); // Should wake up *one* thread
pthread_mutex_unlock(&m);
+7  A: 

Yes, the pthread api allows for spurious wakeups, and that's the only reason I can think of that would cause the need to recheck the condition when using pthread_cond_signal.

Maybe a particular release on a particular platform never ever will cause spurious wakeups, but when the docs say you should recheck the condition, I'd do it.

Here's one way that causes spurious wakeups on linux(taken from wikipedia):

Spurious wakeup in Linux

The pthread_cond_wait() function in Linux is implemented using the futex system call. Each blocking system call on Linux returns abruptly with EINTR when the process receives a signal. A POSIX signal will therefore generate a spurious wakeup. This state is not trivial to fix due to 2 reasons:

  • Making signal delivery not interrupt system calls keeps the stack used. If another system call is invoked during a userspace signal handling routine, and that system call is interrupted too, etc, the kernel stack could run out quickly. Returning with EINTR allows to keep stack usage under control. glibc checks (or supposed to) for EINTR after every blocking system call. The futex data structure contains enough information to restart these calls.
  • pthread_cond_wait() can't restart the waiting because it may miss a real wakeup in the little time it was outside the futex system call. This race condition can only be avoided by the caller checking for an invariant. To complicate matters further, POSIX specification states "These functions will not return an error code of EINTR" ref. These functions return zero in case of spurious wakeup ref, according to POSIX.

In contradiction to POSIX, LinuxThreads ("old" linux threads) man page says that EINTR is returned when "pthread_cond_timedwait was interrupted by a signal". The recent version (NPTL) of pthread_cond_timedwait seems to keep to POSIX behaviour, though (returning zero on spurious wakeup

nos
+3  A: 

Suppose you don't check the condition. Then usually you can't avoid the following bad thing happening (at least, you can't avoid it in one line of code):

 Sender                             Receiver
locks mutex
sets condition
signals condvar, but nothing 
  is waiting so has no effect
releases mutex
                                    locks mutex
                                    waits. Forever.

Of course your second code example could avoid this by doing:

pthread_mutex_lock(&m);
if (!condition) pthread_cond_wait(&cond, &m);
// Thread stuff here
pthread_mutex_unlock(&m);

Then it would certainly be the case that if there is only ever at most one receiver, and if cond_signal were the only thing that could wake it up, then it would only ever wake up when the condition was set and hence would not need a loop. noselasd covers why the second "if" isn't true.

Steve Jessop
I see, so an "if" is needed due to a logic reason (endless wait), but a while is actually needed due to implementation issues (spurious signals).
happy_emi