ansaurus

Question

Signal safe use of sem_wait()/sem_post()

Answer 1

+2 A:

Are you sure sem_wait() causes signals to be blocked? I don't think this is the case. The man page for sem_wait() says that the EINTR error code is returned from sem_wait() if it is interrupted by a signal.

You should be able to handle this error code and then your signals will be received. Have you run into a case where signals have not been received?

I would make sure you handle the error codes that sem_wait() can return. Although it may be rare, if you want to be 100% sure you want to cover 100% of your bases.

Kekoa 2009-06-01 23:39:37

Sorry if I am being unclear. sem_wait() doesn't cause signals to be blocked as you say, but I am using sigprocmask() to block them. But, I think you have the right solution which is to look at the EINTR error code which means the handler should not exit but set some flag to say "time to quit". I'll test that out. Thanks.

Jeremy 2009-06-01 23:46:29

That worked. I removed the blocking of signals and changed the signal handler to just set a global meaning it was time to quit. If the sem_wait() returns an error, then I quit which is fine and preserves the semaphore count. If I am waiting for the child, I test the timeToQuit global also. If it is time to quit, I kill the child and do a sem_post() which preserves the semaphore count. Thanks. That should be signal safe, ignoring kill -9.

Jeremy 2009-06-02 00:04:45

Answer 2

A:

Are you sure you are approaching the problem correctly? If you want to wait for a child terminating, you may want to use the waitpid() system call. As you observed, it is not reliable to expect the child to do the sem_post() if it may receive signals.

Juliano 2009-06-02 00:04:22

Actually, the child is unaware of the semaphore and the parent is doing a waitpid() in the RunChild() function in the pseudocode which I didn't show. That part was fine and it was handling termination of the child and signals sent to the parent after the sem_take() completed. I was concerned about the critical section between setting the signal handler and the sem_wait() but checking the sem_wait() error code was the simple solution I couldn't see.

Jeremy 2009-06-02 00:08:08

Answer 3

A:

I have similar problem with my application. Application is running on embedded Linux machine. It's basically multiprotocol gateway that connects two industrial Ethernet networks.

We are experiencing some kind of application hang every 2 to 3 days. It seems like both threads are still running but SIGALARM signal is getting lost. I'm not completely sure if this is a case because project is in testing phase and I can't make any application changes until end of a week.SIGALARM is used for counter implementation and it is activated every second.

I have semaphores inside signal handler and inside thread. I have initialized SIGALRM with sigaction and SA_RESTART flag. I suspect this is a reason why we are experiencing counter (or application) hang.

void start_timer(void)
{
   struct sigaction sa;
   struct itimerval timer;

   memset(&sa, 0, sizeof (sa));

   sa.sa_handler = &timer_handler;
   sa.sa_flags = SA_RESTART;  
   sigaction(SIGALRM, &sa, NULL);

   timer.it_value.tv_sec = 1;
   timer.it_value.tv_usec = 0;

   timer.it_interval.tv_sec = 1;
   timer.it_interval.tv_usec = 0;

   setitimer(ITIMER_REAL, &timer, NULL);
}


int main(int  argc, char *argv[])
{
    //  write log entry
    //  check arguments to main
    //  read files containing parameters  of remote devices 

    //  initialise semaphores 
    sem_init(&semCounters, 0, 1);
    sem_init(&semPackets, 0, 1);
    sem_init(&semEvents, 0, 1);

    start_timer();
    pthread_create(&thread1, NULL, firstThread, NULL);
    pthread_create(&thread2, NULL, secondThread, NULL);

    while(1)
    {            
        //  infinite loop
    }

    return 0;
}

Inside a second thread (secondThread) I have multiple math operation with time counters. All counter are protected with semaphores (semCounter). I'm using same counters inside signal handler! Semaphores are in both locations. Below is an example of use reinitialization of protected variable. This code can be found both in signal handler and in thread.

sem_wait(&semCounters);
counter.t4 = 0;
sem_post(&semCounters);

One more thing. I'm creating other threads inside signal handler but I'm not waiting thread to finish processing. Application has been restarted 4 times so far. Longest running time without hang is three days.

2010-09-02 08:59:36

ansaurus

tags:

views:

answers:

Signal safe use of sem_wait()/sem_post()

related questions