views:

360

answers:

5

I have 3 processes which need to be synchronized. Process one does something then wakes process two and sleeps, which does something then wakes process three and sleeps, which does something and wakes process one and sleeps. The whole loop is timed to run around 25hz (caused by an external sync into process one before it triggers process two in my "real" application). I use sem_post to trigger (wake) each process, and sem_timedwait() to wait for the trigger.

This all works successfully for several hours. However at some random time (usually after somewhere between two and four hours), one of the processes starts timing out in sem_timedwait(), even though I am sure the semaphore is being triggered with sem_post(). To prove this I even use sem_getvalue() immediately after the timeout, and the value is 1, so the timedwait should have been triggered.

Please see following code:

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <errno.h>
#include <semaphore.h>

sem_t trigger_sem1, trigger_sem2, trigger_sem3;

// The main thread process.  Called three times with a different num arg - 1, 2 or 3.
void *thread(void *arg)
{
  int num = (int) arg;
  sem_t *wait, *trigger;
  int val, retval;
  struct timespec ts;
  struct timeval tv;

  switch (num)
    {
      case 1:
        wait = &trigger_sem1;
        trigger = &trigger_sem2;
        break;
      case 2:
        wait = &trigger_sem2;
        trigger = &trigger_sem3;
        break;
      case 3:
        wait = &trigger_sem3;
        trigger = &trigger_sem1;
        break;
    }

  while (1)
    {
      // The first thread delays by 40ms to time the whole loop.  
      // This is an external sync in the real app.
      if (num == 1)   
        usleep(40000);

      // print sem value before we wait.  If this is 1, sem_timedwait() will
      // return immediately, otherwise it will block until sem_post() is called on this sem. 
      sem_getvalue(wait, &val);
      printf("sem%d wait sync sem%d. val before %d\n", num, num, val);

          // get current time and add half a second for timeout.
      gettimeofday(&tv, NULL);
      ts.tv_sec = tv.tv_sec;
      ts.tv_nsec = (tv.tv_usec + 500000);    // add half a second
      if (ts.tv_nsec > 1000000)
        {
          ts.tv_sec++;
          ts.tv_nsec -= 1000000;
        }
      ts.tv_nsec *= 1000;    /* convert to nanosecs */

      retval = sem_timedwait(wait, &ts);
      if (retval == -1)
        {
          // timed out.  Print value of sem now.  This should be 0, otherwise sem_timedwait
          // would have woken before timeout (unless the sem_post happened between the 
          // timeout and this call to sem_getvalue).
          sem_getvalue(wait, &val);
          printf("!!!!!!    sem%d sem_timedwait failed: %s, val now %d\n", 
            num, strerror(errno), val);
        }
      else
        printf("sem%d wakeup.\n", num);

        // get value of semaphore to trigger.  If it's 1, don't post as it has already been 
        // triggered and sem_timedwait on this sem *should* not block.
      sem_getvalue(trigger, &val);
      if (val <= 0)
        {
          printf("sem%d send sync sem%d. val before %d\n", num, (num == 3 ? 1 : num+1), val);
          sem_post(trigger);
        }
      else
        printf("!! sem%d not sending sync, val %d\n", num, val);
    }
}



int main(int argc, char *argv[])
{
  pthread_t t1, t2, t3;

   // create semaphores.  val of sem1 is 1 to trigger straight away and start the whole ball rolling.
  if (sem_init(&trigger_sem1, 0, 1) == -1)
    perror("Error creating trigger_listman semaphore");
  if (sem_init(&trigger_sem2, 0, 0) == -1)
    perror("Error creating trigger_comms semaphore");
  if (sem_init(&trigger_sem3, 0, 0) == -1)
    perror("Error creating trigger_vws semaphore");

  pthread_create(&t1, NULL, thread, (void *) 1);
  pthread_create(&t2, NULL, thread, (void *) 2);
  pthread_create(&t3, NULL, thread, (void *) 3);

  pthread_join(t1, NULL);
  pthread_join(t2, NULL);
  pthread_join(t3, NULL);
}

The following output is printed when the program is running correctly (at the start and for a random but long time after). The value of sem1 is always 1 before thread1 waits as it sleeps for 40ms, by which time sem3 has triggered it, so it wakes straight away. The other two threads wait until the semaphore is received from the previous thread.

[...]
sem1 wait sync sem1. val before 1
sem1 wakeup.
sem1 send sync sem2. val before 0
sem2 wakeup.
sem2 send sync sem3. val before 0
sem2 wait sync sem2. val before 0
sem3 wakeup.
sem3 send sync sem1. val before 0
sem3 wait sync sem3. val before 0
sem1 wait sync sem1. val before 1
sem1 wakeup.
sem1 send sync sem2. val before 0
[...]

However, after a few hours, one of the threads begins to timeout. I can see from the output that the semaphore is being triggered, and when I print the value after the timeout, it is 1. So sem_timedwait should have woken up well before the timeout. I would never expect the value of the semaphore to be 1 after the timeout, save for the very rare occasion (almost certainly never but it's possible) when the trigger happens after the timeout but before I call sem_getvalue.

Also, once it begins to fail, every sem_timedwait() on that semaphore also fails in the same way. See the following output, which I've line-numbered:

01  sem3 wait sync sem3. val before 0
02  sem1 wakeup.
03  sem1 send sync sem2. val before 0
04  sem2 wakeup.
05  sem2 send sync sem3. val before 0
06  sem2 wait sync sem2. val before 0
07  sem1 wait sync sem1. val before 0
08  !!!!!!    sem3 sem_timedwait failed: Connection timed out, val now 1
09  sem3 send sync sem1. val before 0
10  sem3 wait sync sem3. val before 1
11  sem3 wakeup.
12  !! sem3 not sending sync, val 1
13  sem3 wait sync sem3. val before 0
14  sem1 wakeup.
[...]

On line 1, thread 3 (which I have confusingly called sem1 in the printf) waits for sem3 to be triggered. On line 5, sem2 calls sem_post for sem3. However, line 8 shows sem3 timing out, but the value of the semaphore is 1. thread3 then triggers sem1 and waits again (10). However, because the value is already 1, it wakes straight away. It doesn't send sem1 again as this has all happened before control is given to thread1, however it then waits again (val is now 0) and sem1 wakes up. This now repeats for ever, sem3 always timing out and showing that the value is 1.

So, my question is why does sem3 timeout, even though the semaphore has been triggered and the value is clearly 1? I would never expect to see line 08 in the output. If it times out (because, say thread 2 has crashed or is taking too long), the value should be 0. And why does it work fine for 3 or 4 hours first before getting into this state?

I have tried a similar test using three separate programs, communicating over shared memory, rather than three threads in the same program. This more closely resembles my real world application. The results and the output were the same. The problem does appear to be in the semaphore (particularly the sem_timedwait call) rather than anything to do with pthreads.

I have also tried shorter and longer delays, as well as removing the delay completely, with similar results to those described above. With no delay at all it can sometimes start to produce the error after minutes rather than hours. This does of course mean that the problem can be reproduces a lot quicker.

This is using Ubuntu 9.4 with kernel 2.6.28. The same procedure has been working properly on Redhat and Fedora, but I'm now trying to port to Ubuntu. I have also tried using Ubuntu 9.10, which made no difference.

Thanks for any advice, Giles

+1  A: 

I have no clue on what is going wrong and the code looks fine too me. Here are some things you could do do somehow get more information.

  • use a different timeout, both shorter and longer, and see if your problem still occurs.
  • use a non timed version, and see if the programm hangs.
  • try to modify the behaviour of your kernel scheduler, for example using kernel command line parameters, or using procfs or sysfs.

As pointed by Jens, there are two races :

The first is when evaluationg the value of the semaphore, after the call to sem_timedwait. This is not changing the control flow whit respects to semaphore. Wether the thread timedout or not, it still goes through the "should I trigger the next thread" block.

The second is in the "Should I wakeup the next thread" part. We could have the following events :

  1. Threads n calls sem_getvalue(trigger) and gets a 1
  2. Thread n+1 returns from sem_timedwait and the semaphore goes to 0
  3. Thread n decides not to post and the semaphore stays to 0

Now, I can't see how this could trigger the observed behaviour. After all, since Thread n+1 is waked up anyway, it will in turn wake up thread n+2 which will wake up thread n etc...

While it is possible to get glitches, I can't see how this could lead to systematic timeout from a thread.

shodanex
Thanks a lot for the answer. It certainly is a strange one and, as I said, it works fine on Redhat and Fedora. I will try your three suggestions to see if they throw some light on the situation.
gillez
In the meantime I still have my fingers crossed for some other more definitive answers by an Ubuntu expert...
gillez
An update on the results of these suggestions. With a no timeout the program hangs quite quickly, after only a few minutes. With a short timeout (usleep 40, rather than 40000), the program also hangs quite quickly, after 10 minutes or so. I've onyl tried it once with a longer timeout, but it did hang eventually. Basically the quicker it goes, and the more times around the loop, the more chance there is of the program hanging.
gillez
Good, so now you have a faster reproduction :)
shodanex
Yes, indeed. It makes things much easier for testing. Still no closer to finding a solution though.
gillez
By the way Shodanex, I would really like to vote you up for all the useful comments, but I can't do that until I have a reputation of at least 15. I guess I need to ask a few more questions or provide some answers. But as soon as I can, I will.
gillez
A: 

This is very interesting. While I have not located the source of the error, (still looking) I have verified this on Ubuntu 9.04 running Linux 2.6.34.

Brian G. Marete
Excellent. Thank you very much for trying. To be honest, I thought it was too late for more replies and my question had been lost in the backlog of stackoverflow! Glad to hear you can see the same behaviour and it's not just some weird setup on my system. Would be great to find out the reason for it - here's hoping...
gillez
+1  A: 

Don't blame ubuntu or any other distro on it :-) What is certainly more important here is the version of gcc you are using, 32 or 64 bit etc, how many cores your system has. So please give a bit more information. But looking through your code I found several places that could just bring you unexpected behavior:

  • it starts with the start, casting int in void* back and forth, you are looking for trouble. Use uintptr_t for that if you must, but here you have no excuse to just pass real pointers to the values. &(int){ 1 } and some saner casting would do the trick for C99.

  • ts.tv_nsec = (tv.tv_usec + 500000) is another trouble spot. The right hand side might be of a different width then the left hand side. Do

    ts.tv_nsec = tv.tv_usec;

    ts.tv_nsec += 500000;

  • The sem family of functions are not interrupt safe. Such interrupts may e.g be triggered by IO, since you are doing printf etc. Checking the return value for -1 or so is not sufficient but in such a case you should check errno and decide if you want to retry. Then you'd have to do the recalculation of the remaining time and stuff like that if you want to be precise. Then man page for sem_timedwait has a list of the different error codes that might occur and their reasons.

  • You also conclude things from values that you get via sem_getvalue. In a multi-threading / multi-process / multi-processor environment your thread might have been unscheduled between the return from sem_timedwait and sem_getvalue. Basically you can't deduce anything from that, the variable is just incidentally at the value that you observe. Don't draw conclusions from that.

Jens Gustedt
What do you mean by not interrupt safe ? This does not look like a -EINTR problem
shodanex
@shodanex: Why can you say it doesn't? We just don't have the information to exclude any such thing. The threads can get IO interrupts when they want to the the prints, they are rescheduled or whatever. The sem interface is really low level and as it is written this code is very likely to behave very differently when it is executed on a mono-core or multi-core system, for example. I think the code should first be tidied up before we can say a thing. As it is presented here it is just bad style and non portable.
Jens Gustedt
Because the error message is printed and it is not -EINTR.It is also hanging when not using timed waits. Unless the kernel is not behaving correctly, the sem_* operation should not _behave_ differently wether it is on a monocore or a multicore. Program that works because they depend on a particular ordering of sem operation would behave differently of course. Let's point where the program is depending on a particular scheduling to provoke the described bahaviour. The fact that the value might change between sem_timedwait and sem_getvalue does not explain the observed behaviour.
shodanex
In fine, it should not behave differently, so I guess the bug is there, even when you think it works fine. Scheduling, ordering, parallel execution just triggers race conditions much easier.On what concerns the sem_getvalue, at least in the version as it is here, the control flow depends on it. Make the sem_post unconditional of the value of the semaphore.
Jens Gustedt
Hi Jens and Shodanex, thanks for the comments. With regards to Jens, first two points, whether or not you consider these to be bad practice, they are not part of the problem (if there was an issue passing a pointer in an integer it would cause a problem long before we see the error here). On the third point, as shodanex has stated, the error printed with strerror (ie. from errno) is a timeout, not an interrupt.
gillez
On the fourth point, I did acknowledge the possibilty that the semaphore value could change between the timed out sem_timedwait() and the sem_getvalue(), but do you really believe that is going on here? Once the error starts and sem_timedwait() times out, it happens every time. Also I had another version which printed the semaphore value after the sem_post() call and this was clearly showing the semaphore value had changed before the sem_timedwait() failure.
gillez
@jens: "make the sem_post unconditional of the value of the semaphore". Why? The sem_post is always called unless the value is 1. If it *is* 1, then sem_timedwait() will trigger straight away so why we don't need (and in fact don't want) to call sem_post.
gillez
@gillez: I think, after the program failed to decrease the value the semaphore never falls back to 0 again. It is 1, the next post of the other thread put it to 2. So the semaphore always has enough tokens to never let anybody wait again.
Jens Gustedt
+2  A: 

(Sorry to give a second answer but this one would be too messy to clean up just with editing)

The answer is, I think, already in the original post for the question.

So, my question is why does sem3 timeout, even though the semaphore has been triggered and the value is clearly 1? I would never expect to see line 08 in the output. If it times out (because, say thread 2 has crashed or is taking too long), the value should be 0. And why does it work fine for 3 or 4 hours first before getting into this state?

So the scenario is:

  1. thread 2 takes too long
  2. sem3 times out in sem_timedwait
  3. thread 3 is descheduled or whatever it takes it to reach the sem_getvalue
  4. thread 2 wakes up and does its sem_post on sem3
  5. thread 3 issues its sem_getvalue and sees a 1
  6. thread 3 branches into the wrong branch and doesn't do its sem_post on sem1

This race condition is hard to trigger, basically you have to hit the tiny time window where one thread has had a problem in waiting for the semaphore and then reads the semaphore with the sem_getvalue. The occurrence of that condition is much dependent of the environment (type of system, number of cores, load, IO interrupts) so this explains why it only occurs after hours, if not at all.

Having the control flow depend of a sem_getvalue is generally a bad idea. The only atomic non-blocking access to a sem_t is through sem_post and sem_trywait.

So this example code from the question has that race condition. This doesn't mean that the original problem code that gillez had, does indeed have the same race condition. Perhaps the example is just too simplistic, and still shows the same phenomenon for him.

My guess is, in his original problem there was an unprotected sem_wait. That is a sem_wait that is only checked for its return value and not for errno in the event that it fails. EINTRs do occur on sem_wait quite naturally if the process has some IO. You have just do a do - while with check and reset of errno if you encounter a EINTR.

Jens Gustedt
Hi Jens,Thanks for the comprehensive answer. I do appreciate your thoughts on this. I understand your answer but still don't still don't really think it explains the behaviour that I see (continued in the next comment due to character limit...)
gillez
Once it fails, it then fails every time around the loop. Each time, the value returned by sem_getvalue is 1. I do appreciate the point that this is done after the call to sem_timedwait has failed and therefore thread two could have woken up and done a sem_post in between. And so I can understand how this could happen very occasionally. But after it fails once, it shows the same behaviour every time around the loop. Does this mean that step four in your scenario is happening in this tiny window every time? After the sem_timedwait fails the value from the sem_getvalue is *always* 1.
gillez
Also, the time out is set to quite a large interval (half a second). I really don't understand why thread 2 can take so long (step 1) that thread 3 times out, unless the system is really busy. Unless it's not actually timing out but is failing due to an interrupt, in which case it could be failing a lot quicker than 1/2 a second, but if that's the case why does the strerror output say "connection timed out" rather than "interrupt"? I will try increasing the timeout to 10 seconds or so. I will also call strerror with errno before calling sem_timedout, in case that is changing errno.
gillez
Once the program missed a wait that corresponds to a post of the other thread, the semaphore probably never falls to 0 again. It always is 1 or 2. So no it is not the same behavior every time, your (implicit) invariant just doesn't hold anymore.For the time out value, 0.5 seconds is not much. If there is something heavy going on on your system, a thread might well get descheduled for that amount of time. There is even another point where the thread may be descheduled, namely right after getimeofday. I'd do a loop for some seconds, with a printf for warning.
Jens Gustedt
If the semaphore value is 1 or 2 rather than 0, the next sem_timedwait would return straight away without timing out. The behaviour is the opposite of this.
gillez
@gillez: I don't know, since I was not able to reproduce your error on my machine (ubuntu, though ;-) I wouldn't be of much more help. What I pointed out, is that you definitively have a race condition in the code you posted. There might be others, who knows raw semaphores are a subtle business, they are too low level. Mutexes and conditions are much better suited for an application context. I think it would be best you tidy up your code as I indicated and test again.
Jens Gustedt
A: 

I gave the program a shot on my Ubuntu 10.04 x86_64 Core i7 machine.

When running with usleep(40000), the program ran fine for half an hour or something boring.

When running with usleep(40), the program ran fine for another half hour, maybe more, before my machine froze. X died. Control+alt+F1-7 died. I couldn't ssh in. (Sadly, this goofy Apple keyboard doesn't have a sysrq key. I like typing on it, but I sure don't need f13, f14, or f15. I'd do horrible things to get a proper sysrq key.)

And the absolute best: NOTHING in my logs tells me what happened.

$ uname -a
Linux haig 2.6.32-22-generic #36-Ubuntu SMP Thu Jun 3 19:31:57 UTC 2010 x86_64 GNU/Linux

At the same time, I was also playing a Java game in browser (posted by a fellow stackoverflow user looking for feedback, fun diversion :) -- so it's possible that the jvm is responsible for tickling something to freeze my machine solid.

sarnold