ansaurus

Question

Why do pthreads’ condition variable functions require a mutex?

Answer 1

+6 A:

It's just the way that condition variables were initially implemented in DEC threading, the precursor to pthreads.

The mutex was originally used to protect the condition variable itself. That's why you need it locked before you do a wait.

The wait will "atomically" unlock the mutex, allowing others access to the condition variable (for signalling). Then when the condition variable is signalled or broadcast to, one or more of the threads on the waiting list will be woken up and the mutex will be magically locked again for that thread.

You typically see the following operation with condition variables, illustrating how they work. The following example is a worker thread which is given work via a signal to a condition variable.

thread:
    initialise.
    lock mutex.
    while thread not told to stop working:
        wait on condvar using mutex.
        if work is available to be done:
            do the work.
    unlock mutex.
    clean up.
    exit thread.

The work is done within this loop provided that there is some available when the wait returns. When the thread has been flagged to stop doing work (usually by another thread setting the exit condition then kicking the condition variable to wake this thread up), the loop will exit, the mutex will be unlocked and this thread will exit.

The code above is a single-consumer model as the mutex remains locked while the work is being done. For a multi-consumer variation, you can use, as an example:

thread:
    initialise.
    lock mutex.
    while thread not told to stop working:
        wait on condvar using mutex.
        if work is available to be done:
            copy work to thread local storage.
            unlock mutex.
            do the work.
            lock mutex.
    unlock mutex.
    clean up.
    exit thread.

which allows other consumers to receive work while this one is doing work.

The condition variable relieves you of the burden of polling some condition instead allowing another thread to notify you when something needs to happen. Another thread can tell that thread that work is available as follows:

lock mutex.
flag work as available.
signal condition variable.
unlock mutex.

The vast majority of what are often erroneously called spurious wakeups was generally always because multiple threads had been signalled within their pthread_cond_wait call (broadcast), one would return with the mutex, do the work, then re-wait.

Then the second signalled thread could come out when there was no work to be done. So you had to have an extra variable indicating that work should be done (this was inherently mutex-protected with the condvar/mutex pair here - other threads needed to lock the mutex before changing it however).

It was technically possible for a thread to return from a condition wait without being kicked by another process (this is a genuine spurious wakeup) but, in all my many years working on pthreads, both in development/service of the code and as a user of them, I never once received one of these. Maybe that was just because HP had a decent implementation :-)

In any case, the same code that handled the erroneous case also handled genuine spurious wakeups as well since the work-available flag would not be set for those.

paxdiablo 2010-05-04 08:12:22

+1 for the pattern of how to use mutexes and condvars; that's how it is *always* done. (Well, apart from when it is done wrongly…)

Donal Fellows 2010-05-04 08:23:13

'do something ' shouldn't be inside the while loop. You'd want your while loop to just check the condition, otherwise you might also 'do something' if you get a spurious wakeup.

nos 2010-05-04 08:30:12

Well, yes, you need to check error condition, I'd think that would go without saying. But, assuming there were none, you would have the mutex and it would be safe to "do something". I'll clarify.

paxdiablo 2010-05-04 09:08:49

no, error handling is second to this. With pthreads, you can be woken up, for no apparent reason(a spurious wakeup) , and with out any error. Thus you need to recheck 'some condition' after you're woken up.

nos 2010-05-04 09:41:48

I’m not sure I understand. I had the same reaction as **nos**; why is `do something` inside the `while` loop?

elliottcable 2010-05-04 09:42:14

Because that's when you've been signalled with the condition variable. nos is right that your thread can wake up with no work to be done (it was never spurious by the way, what would happen is that it was possible for two threads to be wakened _within_ their cond_wait then one would return with the mutex and do the work, then when it rewaited on the condition, the second would return and no work would be there for it). I consider that an error condition hence my changes. Obviously further clarification is needed.

paxdiablo 2010-05-04 10:26:48

Perhaps I'm not making it clear enough. The loop is _not_ to wait for work to be ready so you can do it. The loop is the main "infinite" work loop. If you return from cond_wait and the work flag is set, you do the work then loop around again. "while some condition" will only be false when you want the thread to stop doing work at which point it will release the mutex and most likely exit.

paxdiablo 2010-05-04 10:43:45

Ahhhhhhhhhhh I see. Thanks for the clarification, that was a little ambiguous. +1’d now, though you should edit it and make that clearer.

elliottcable 2010-05-04 12:59:16

A new problem, since your last edit: Um, you lock/unlock the mutex in the consumer *outside* the work loop. That means, except when it’s blocked due to the condvar-wait, the mutex would always be locked… so how could you have multiple consumers? Shouldn’t, then, the lock/unlock be inside the loop?

elliottcable 2010-05-04 13:08:44

No, the cond_wait unlocks the mutex automatically and re-locks it before returning. While the thread is within the cond_wait call, it does not have the mutex locked.

paxdiablo 2010-05-04 13:35:56

Yes, but I’m saying that that code would prevent two ‘worker threads’ from being active at once, working on different elements of the posited queue of work-to-be-done. Right?

elliottcable 2010-05-04 13:44:17

Yes it would (sorry, I misunderstood your last comment), the example is a clear multi-producer, single-consumer model. It's easy enough to move to a multi-consumer option if you copy the work items and release the mutex before doing the work, claiming it again afterwards. That's just a minor mod.

paxdiablo 2010-05-04 13:49:45

Another picky bit about your latest modified psuedocode: Unless I’m mistaken, it seems that only one thread can ever be waiting on the condvar at a given time (the others will be locked against the mutex, before beginning to wait on the condvar); doesn’t that sort of defeat the point? i.e. having multiple threads waiting on the condvar, and then `pthread_cond_signal()` ing against it, to cause just one of those to wake up and take a piece of work.

elliottcable 2010-05-05 07:25:46

I’m curious what you think of my implementation so far. Since you can’t exactly paste code in comments: http://gist.github.com/390498

elliottcable 2010-05-05 07:43:07

@elliot, while a thread is within the condvarwait, it doesn't have the mutex locked (see the third paragraph in my answer). So your contention that other threads would be waiting on the mutex is not correct. It wouldn't matter even if that were the case since, in the multi-consumer model, the thread would unlock the mutex as soon as it began the work (after copying to thread-local storage), allowing another thread to enter condvarwait. But that isn't actually the case (as explained at the start of this comment) so it doesn't matter.

paxdiablo 2010-05-05 08:28:49

Re the code review, I suggest you just try it under load. If you find a problem, I'd be happy to look at any specifics then. I'm happy to help out with specific questions and problems but my day job unfortunately precludes me from large-effort code reviews. Anyway, I _hate_ code reviews almost as much as I hate documentation :-)

paxdiablo 2010-05-05 08:32:08

No problem, man. You’ve already been a huge, huge help. I was just trying to figure out if my application of the mutex/condvar pattern was, you know, ‘correct’ (for a given definition of ‘correct.’)

elliottcable 2010-05-05 09:46:05

Answer 2

+1 A:

The mutex is supposed to be locked when you call pthread_cond_wait; when you call it it atomically both unlocks the mutex and then blocks on the condition. Once the condition is signaled it atomically locks it again and returns.

This allows the implementation of predictable scheduling if desired, in that the thread that would be doing the signalling can wait until the mutex is released to do its processing and then signal the condition.

Amber 2010-05-04 08:12:40

So… is there a reason for me to *not* just leave the mutex always-unlocked, and then lock it right before waiting, and then unlock it right after waiting finishes?

elliottcable 2010-05-04 08:14:14

The mutex also solves some potential races between the waiting and signalling threads. as long as the mutex is always locked when changing the condition and signalling , you'll never find yourself missing the signal and sleeping forever

Hasturkun 2010-05-04 08:44:00

So… I should *first* wait-on-mutex on the conditionvar’s mutex, before waiting on the conditionvar? I’m not sure I understand at all.

elliottcable 2010-05-04 09:42:59

Answer 3

A:

Besides consistency I think it might be related to memory visibility. Seems like POSIX guarantees (memory visibility between threads) that two threads see memory equally after four events and one of them is pthread_mutex_unlock().

So this might look like this:

Step 1. Thread 1 has done some work and starts waiting on a conditional variable, unlocks a mutex. Unlocking the mutex guarantees (according to rule #2) that all other threads will see all modifications in memory that have been done by the thread 1. If you do not unlock any mutex at this point then according to this book it is not guaranteed that other threads will see changes done by the thread #1.

Step 2. Thread 2 changes some data and signals on the conditional variable. According to rule #4 the thread #1 will see all modifications done by the thread #2

skwllsp 2010-05-04 08:57:45

Answer 4

+6 A:

A condition variable is quite limited if you could only signal a condition, usually you need to handle some data that's related to to condition that was signalled. Signalling/wakeup have to be done atomically in regards to achieve that without introducing race conditions, or be overly complex

pthreads can also give you , for rather technical reasons, a spurious wakeup . That means you need to check a predicate, so you can be sure the condition actually was signalled - and distinguish that from a spurious wakeup. Checking such a condition in regards to waiting for it need to be guarded - so a condition variable needs a way to atomically wait/wake up while locking/unlocking a mutex guarding that condition.

Consider a simple example where you're notified that some data are produced. Maybe another thread made some data that you want, and set a pointer to that data.

Imagine a producer thread giving some data to another consumer thread through a 'some_data' pointer.

while(1) {
    pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
    char *data = some_data;
    some_data = NULL;
    handle(data);
}

you'd naturally get a lot of race condition, what if the other thread did some_data = new_data right after you got woken up, but before you did data = some_data

You cannot really create your own mutex to guard this case either .e.g

while(1) {

    pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
    pthread_mutex_lock(&mutex);
    char *data = some_data;
    some_data = NULL;
    pthread_mutex_unlock(&mutex);
    handle(data);
}

Will not work, there's still a chance of a race condition in between waking up and grabbing the mutex. Placing the mutex before the pthread_cond_wait doesn't help you, as you will now hold the mutex while waiting - i.e. the producer will never be able to grab the mutex. (note, in this case you could create a second condition variable to signal the producer that you're done with some_data - though this will become complex, especially so if you want many producers/consumers.)

Thus you need a way to atomically release/grab the mutex when waiting/waking up from the condition. That's what pthread condition variables does, and here's what you'd do:

while(1) {
    pthread_mutex_lock(&mutex);
    while(some_data != NULL) { // predicate to acccount for spurious wakeups,would also 
                               // make it robust if there were several consumers
       pthread_cond_wait(&cond,&mutex); //atomically lock/unlock mutex
    }

    char *data = some_data;
    some_data = NULL;
    pthread_mutex_unlock(&mutex);
    handle(data);
}

(the producer would naturally need to take the same precautions, always guarding 'some_data' with the same mutex, and making sure it doesn't overwrite some_data if some_data is currently != NULL)

nos 2010-05-04 09:01:33

Shouldn't the `while (some_data != NULL)` be a do-while loop so that it waits for the condition variable at least once?

Judge Maygarden 2010-05-04 13:08:20

No. What you're really waiting for, is for 'some_data' to be non-null. If it is non-null the "first time", great, you're holding the mutex and can safely use the data. If you had a do/while loop you would miss the notification if someone signalled the condition variable before you waited on it (it's nothing like the events found on win32 which stay signalled until someone waits for them)

nos 2010-05-04 13:32:56

Answer 5

+1 A:

The race it's avoiding is actually quite simple. Consider a consumer/producer where wait();, unlock(); and lock(); were distinct:

Consumer:

while (1)
{
    wait();
    lock();
    consume();
    unlock();
}

Producer:

/* ... */
lock();
produce();
signal();
unlock();
/* ... */

Under this scheme, if the Consumer happens to be in between unlock(); and wait(); when the Producer executes signal();, it will miss the signal and potentially wait forever.

Under the scheme where wait(); implies an atomic unlock-and-lock, the Consumer becomes:

lock();
while (1)
{
    wait();
    consume();
}

Because signal(); is only done under the lock, and the only time the Consumer doesn't hold the lock is atomically within wait();, the race is avoided.

caf 2010-05-04 12:22:09

ansaurus

tags:

views:

answers:

Why do pthreads’ condition variable functions require a mutex?

related questions