ansaurus

Question

Not locking mutex for pthread_cond_timedwait and pthread_cond_signal ( on Linux )

Answer 1

+6 A:

The first is not OK:

The pthread_cond_timedwait() and pthread_cond_wait() functions shall block on a condition variable. They shall be called with mutex locked by the calling thread or undefined behavior results.

http://opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwait.html

The reason is that the implementation may want to rely on the mutex being locked in order to safely add you to a waiter list. And it may want to release the mutex without first checking it is held.

The second is disturbing:

if predictable scheduling behaviour is required, then that mutex is locked by the thread calling pthread_cond_signal() or pthread_cond_broadcast().

http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_cond_signal.html

Off the top of my head, I'm not sure what the specific race condition is that messes up scheduler behaviour if you signal without taking the lock. So I don't know how bad the undefined scheduler behaviour can get: for instance maybe with broadcast the waiters just don't get the lock in priority order (or however your particular scheduler normally behaves). Or maybe waiters can get "lost".

Generally, though, with a condition variable you want to set the condition (at least a flag) and signal, rather than just signal, and for this you need to take the mutex. The reason is that otherwise, if you're concurrent with another thread calling wait(), then you get completely different behaviour according to whether wait() or signal() wins: if the signal() sneaks in first, then you'll wait for the full timeout even though the signal you care about has already happened. That's rarely what users of condition variables want, but may be fine for you. Perhaps this is what the docs mean by "unpredictable scheduler behaviour" - suddenly the timeslice becomes critical to the behaviour of your program.

Btw, in Java you have to have the lock in order to notify() or notifyAll():

This method should only be called by a thread that is the owner of this object's monitor.

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#notify()

The Java synchronized {/}/wait/notifty/notifyAll behaviour is analogous to pthread_mutex_lock/pthread_mutex_unlock/pthread_cond_wait/pthread_cond_signal/pthread_cond_broadcast, and not by coincidence.

Steve Jessop 2009-06-16 18:42:22

You might want to fix the link to the JavaDoc... the parenthesis need to be part of the URL.

Chris Arguin 2009-06-16 19:47:12

Java throws if monitor if not owned by the right thread.

Arkadiy 2009-06-16 20:39:30

I think all it means is that when you modify the condition, then release the lock, another thread (call it B) can grab the mutex, check the condition and see that it is signalled, and go act on that. Meanwhile the cond_signal() is firing and the other thread (call it A) that was going to be able to check the condition blocks trying to lock the mutex. When A finally does grab the mutex, it sees that B already reset the condition and goes back to sleep again in cond_wait. If all threads are "equal" it is harmless, but in theory thread A could get starved.

Greg Rogers 2009-06-16 21:32:36

Answer 2

+2 A:

The point of waiting on conditional variable paired with a mutex is to atomically enter wait and release the lock, i.e. allow other threads to modify the protected state, then again atomically receive notification of the state change and acquire the lock. What you describe can be done with many other methods like pipes, sockets, signals, or - probably the most appropriate - semaphores.

Nikolai N Fetissov 2009-06-16 18:43:57

Answer 3

+1 A:

I think this should work (note untested code):

// initialize a semaphore
sem_t sem;
sem_init(&sem,
    0, // not shared
    0  // initial value of 0
    );


// thread A
struct timespec tm;
struct timeb    tp;

const long sec      = msecs / 1000;
const long millisec = msecs % 1000;

ftime(&tp);
tp.time += sec;
tp.millitm += millisec;
if(tp.millitm > 999) {
    tp.millitm -= 1000;
    tp.time++;
}
tm.tv_sec  = tp.time;
tm.tv_nsec = tp.millitm * 1000000;

// wait until timeout or woken up
errno = 0;
while((sem_timedwait(&sem, &tm)) == -1 && errno == EINTR) {
    continue;
}

return errno == ETIMEDOUT; // returns true if a timeout occured


// thread B
sem_post(&sem); // wake up Thread A early

Evan Teran 2009-06-16 18:44:43

+1, although beware that sem_timed_wait is optional. By which I mean more optional than semaphores, which themselves are also optional but there's not much point having pthreads without them...

Steve Jessop 2009-06-16 19:18:31

Answer 4

A:

"unpredictable scheduling behavior" means just that. You don't know what's going to happen. Nor do the implementation. It could work as expected. It could crash your app. It could work fine for years, then a race condition makes your app go monkey. It could deadlock.

Basically if any docs suggest anything undefined/unpredicatble can happen unless you do what the docs tell you to do, you better do it. Else stuff might blow up in your face. (And it won't blow up until you put the code into production , just to annoy you even more. Atleast that's my experience)

nos 2009-06-16 20:02:32

I don't agree. The meaning of "unpredictable scheduler behavior" is ambiguous, but I'm pretty certain the intention is not to imply "undefined behavior", or they'd have said that as they do in pthread_cond_wait.

Steve Jessop 2009-06-17 13:03:33

Answer 5

A:

Butenhof's excellent "Programming with POSIX Threads" discusses this right at the end of chapter 3.3.3.

Basically, signalling the condvar without locking the mutex is a potential performance optimisation: if the signalling thread has the mutex locked, then the thread waking on the condvar has to immediately block on the mutex that the signalling thread has locked even if the signalling thread is not modifying any of the data the waiting thread will use.

The reason that "unpredictable scheduler behavior" is mentioned is that is you have a high-priority thread waiting on the condvar (which another thread is going to signal and wakeup the high priority thread), any other lower-priority thread can come and lock the mutex so that when the condvar is signalled and the high-priority thread is awakened, it has to wait on the lower-priority thread to release the mutex. If the mutex is locked whilst signalling, then the higher-priority thread will scheduled on the mutex before the lower-priority thread: basically you know that that when you "awaken" the high-priority thread it will awaken as soon as the scheduler allows it (of course, you might have to wait on the mutex before signalling the high-priority thread, but that's a different issue).

TheJuice 2010-02-19 17:51:46

ansaurus

tags:

views:

answers:

Not locking mutex for pthread_cond_timedwait and pthread_cond_signal ( on Linux )

related questions