views:

593

answers:

4

Hi. I'm synchronizing reader and writer processes on Linux.

I have 0 or more process (the readers) that need to sleep until they are woken up, read a resource, go back to sleep and so on. Please note I don't know how many reader processes are up at any moment. I have one process (the writer) that writes on a resource, wakes up the readers and does its business until another resource is ready (in detail, I developed a no starve reader-writers solution, but that's not important).

To implement the sleep / wake up mechanism I use a Posix condition value, pthread_cond_t. The clients call a pthread_cond_wait() on the variable to sleep, while the server does a pthread_cond_broadcast() to wake them all up. As the manual says, I surround these two calls with a lock/unlock of the associated pthread mutex.

The condition variable and the mutex are initialized in the server and shared between processes through a shared memory area (because I'm not working with threads, but with separate processes) an I'm sure my kernel / syscall support it (because I checked _POSIX_THREAD_PROCESS_SHARED).

What happens is that the first client process sleeps and wakes up perfectly. When I start the second process, it blocks on its pthread_cond_wait() and never wakes up, even if I'm sure (by the logs) that pthread_cond_broadcast() is called.

If I kill the first process, and launch another one, it works perfectly. In other words, the condition variable pthread_cond_broadcast() seems to wake up only one process a time. If more than one process wait on the very same shared condition variable, only the first one manages to wake up correctly, while the others just seem to ignore the broadcast.

Why this behaviour? If I send a pthread_cond_broadcast(), every waiting process should wake up, not just one (and, however, not always the same one).

+2  A: 

Have you set the PTHREAD_PROCESS_SHARED attribute on both your condvar and mutex?

For Linux consult the following man pages:

Methods, types, constants etc. are normally defined in /usr/include/pthread.h, /usr/include/nptl/pthread.h.

vladr
Vlad, I'm on Linux, there's no such attribute (according to the manpages).
janesconference
@james, check your header files (`find /usr/include/ -type f | xargs egrep '(PTHREAD_PROCESS_SHARED|pthread_condattr_setpshared|pthread_mutexattr_setpshared)'`), it should all be there in `/usr/include/pthread.h`, even on Linux (it's POSIX after all, and I have it on my CentOS 4.x box.)
vladr
...which also bears the question, while we're at it, what Linux are you on? :) (`uname -a; cat /etc/issue`)
vladr
I'm on Montavista for ARM 9.
janesconference
you were right! thanks.
janesconference
glad to be of help. cheers.
vladr
+1  A: 

The documentation says that it should work... are you sure it's the same conditional value that the rest of the threads are looking at?

This is the example code from opengroup.org:

pthread_cond_wait(mutex, cond):
    value = cond->value; /* 1 */
    pthread_mutex_unlock(mutex); /* 2 */
    pthread_mutex_lock(cond->mutex); /* 10 */
    if (value == cond->value) { /* 11 */
        me->next_cond = cond->waiter;
        cond->waiter = me;
        pthread_mutex_unlock(cond->mutex);
        unable_to_run(me);
    } else
        pthread_mutex_unlock(cond->mutex); /* 12 */
    pthread_mutex_lock(mutex); /* 13 */


pthread_cond_signal(cond):
    pthread_mutex_lock(cond->mutex); /* 3 */
    cond->value++; /* 4 */
    if (cond->waiter) { /* 5 */
        sleeper = cond->waiter; /* 6 */
        cond->waiter = sleeper->next_cond; /* 7 */
        able_to_run(sleeper); /* 8 */
    }
    pthread_mutex_unlock(cond->mutex); /* 9 */
Lirik
+1  A: 

Do you test for some condition before your process actually call pthread_cond_wait() ? I am asking because, it's a very common mistake : Your process must not call wait() unless you are sure that some process will call signal() (or broadcast()) later.

concidering this code (from pthread_cond_wait man page) :

          pthread_mutex_lock(&mut);
          while (x <= y) {
                  pthread_cond_wait(&cond, &mut);
          }
          /* operate on x and y */
          pthread_mutex_unlock(&mut);

If your omit the while test, and just signal from another process whenever your (x <= y) condition is true, it won't work since the signal only wakes up the process the are already waiting. If signal() called before the other process calls wait() the signal will be lost and the waiting process will be waiting forever.

EDIT : About the while loop. When you are signaling one process from another process it is set on the ''ready list'' but not necessarily scheduled and your condition (x <= y) may be change again since no one holds the lock. That's why you need to check for your condition each time you are about to wait. It should always be wakeup -> check if the condition is still true -> do work.

hope it's clear.

Ben
I don't fully understand your answer.. how can adding a while loop prevent the wait from blocking?
janesconference
@janesconference : see my edit
Ben
A: 

what the last poster said is correct. the KEY to the whole cond-variable situation working correctly is that the cond-var is NOT signalled prior to it being waited on. its strictly a signal that is to be used when others (single or multiple) are waiting. when no one is waiting, its effectively a NOP. which, btw, is NOT how i believe it SHOULD work, but how it DOES work.

larry

larry