views:

61

answers:

4

Hello all!

I'm migrating a multi threaded application from HP-UX to Solaris and so far, everything is OK except for one thing! The application has a thread that is handling the signals and, when some of them are received, it runs some cleaning (logging, kill child processes and so on).

I've reduced the code as much as it was possible to make a somehow simple example showing the problem:

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <synch.h>
#include <iostream>
#include <unistd.h>

using namespace std;

pthread_t       m_signalHandlerThread;
sigset_t        m_signalSet;

void    signalHandler()
{
    while ( true )
    {
        cout << "SigWait..." << endl;
        sigwait( &m_signalSet, &sig );
        cout << "Signal!! : " << sig << endl;

        break;
    }

    cout << "OUT" << endl;
}

void*   signalHandlerThreadFunction( void* arg )
{
   signalHandler();

   return (void*)0;
}


int main()  
{
    sigemptyset( &m_signalSet );
    sigaddset( &m_signalSet, SIGQUIT );             //kill -QUIT
    sigaddset( &m_signalSet, SIGTERM );             //kill
    sigaddset( &m_signalSet, SIGINT );              //ctrl-C
    sigaddset( &m_signalSet, SIGHUP );              //reload config

    if ( pthread_create( &m_signalHandlerThread, NULL, signalHandlerThreadFunction, NULL ) )
    {
        cout << "cannot create signal handler thread, system shut down.\n" << endl;
    }

    int iTimeout = 0;
    while (1) 
    {
        if (iTimeout >= 10)
           break;

        sleep(1);
        iTimeout++;
        cout << "Waiting... " << iTimeout << endl;
    }

    cout << "END" << endl;

    exit (0);
}

Using compile command lines: Solaris:

CC -m64 -g temp.cpp -D_POSIX_PTHREAD_SEMANTICS -lpthread

HP-UX:

/opt/aCC/bin/aCC +p +DA2.0W -AA -g -z -lpthread -mt -I/usr/include  temp.cpp     

Running both applications, the behaviour (pressing CTRL+C while in the 10 seconds loop):

HP-UX:

./a.out

SigWait...
Waiting... 1
Waiting... 2
Signal!! : 2   <---- CTRL + C
OUT
Waiting... 3
Waiting... 4   <---- CTRL + C again to terminate

Solaris:

./a.out

SigWait...
Waiting... 1
Waiting... 2   <---- CTRL + C
^C

Any help will be more then welcome since I'm already tearing my hair (not much left) :)!

Thanks!

+4  A: 

It's unspecified which of your 2 threads will handle SIGINT. If you need only one of your threads to handle the signal, you need to block that signal in all the other threads you have.

nos
Your answer pointed me to the right direction. Thanks.
JoaoSantos
A: 

This is rather unorthodox way to handle signals. If you want to marry the signals and threads, better choice would be to have the usual signal handlers from where the signal is serialized internally to another thread which is responsible for the actual handling of the event.

That is also a better option, as it is undefined which thread in an MT application receives the signal. Any threads which doesn't have the signal blocked might receive it. If you have 2 threads (and you have two threads in the example) then any of the threads might get the SIGINT.

You might want to check sigprocmask() as a way to tell OS that SIGINT should be blocked in a thread. That should be done for every thread, IIRC even the one calling sigwait().


Edit1. Actually I'm wrong about the "should be done for every thread" bit above. A new thread inherits its signal mask from the current thread. I have realized that that can't be true because that would have introduced the race condition: signal arrives at the time when new thread created but hasn't yet set its signal mask. In other words, it is sufficient to set the signal mask in the main thread.

Dummy00001
Thanks! This did solve my problem, even in the entire application.I just don't understand why this works on HP-UX... maybe in that thread implementation all threads receive the signal?
JoaoSantos
"maybe in that thread implementation all threads receive the signal?" the application received the signal - but it may be handled in any *random* thread where it could be handled. HP-UX might have noticed that a thread in your app uses sigwait() while Solaris hasn't bothered. Signals vs. threads is best described as a gray area where you really do not want to experiment. Even POSIX can't describe the behavior in full since finer details differ greatly from one OS to another.
Dummy00001
I am giving -1 for this answer. This answer is wrong in that you cannot serialize the signal from the classic signal handler to any other thread or data structure. You would have to use some locking inside the signal handler and that is not possible. There is very little you can do in a signal handler, using locking primitives is not one of them.
wilx
@wilx: [Please read this](http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_04.html#tag_02_04) (scroll down to for the list of functions safe to be used from the signal handler). Locking isn't only option: I generally use `write()` on a pipe, but used `sem_post()` once too.
Dummy00001
`sem_post()` is not usable, IMHO, you could miss individual signals. Pipe and `write()` is but then you still have to have a thread waiting for the "signal". Using `sigwait()` in a dedicated thread is much simpler.
wilx
@wilx: How can you miss events with `sem_post`??? This is semaphore aka atomic counter - explicitly mandated to be signal safe. Pipe and `write()` most of the time are fine, because one normally has threads which do IO multiplexing anyway - one more file descriptor never harms. The two cover 100% of how I did in countless MT applications signals handling. And it is precisely `sigwait()` which if used *even* *slightly* incorrectly might miss events. For example see the question at the top of the page.
Dummy00001
@wilx: Another non-obvious point. One needs the `sigaction()` anyway - for the added benefit of `SA_RESTART`. Unless of course one plans to debug the whole application forever to make handling of `EINTR` graceful.
Dummy00001
@Dummy00001: If you block the signals then you do not get EINTR. As for `sem_post()`, the counter is limited, you cannot keep raising it forever in the handler. At some point you cannot raise it any further and that's where you start missing the signals.
wilx
@wilx: good point about blocked signals. Though overflow of semaphore is practically unrealistic: you forget that signal queues are very very limited (if available at all) and multiple signals gets lost anyway before they even reach the application - long long long before the semaphore may overflow.
Dummy00001
+1  A: 

You should block signals to other threads by using pthread_sigmask. that page also contains an example for a program with a signal handling thread.

Hasturkun
You are correct on the function to use. I accepted the previous one because it was posted before yours although it mention the call for single threaded function.
JoaoSantos
@JoaoSantos: `pthread_sigmask()` is equivalent to `sigprocmask()`. There is no "single-threaded" or "multi-threaded" signal functions - there are functions which have defined behavior in MT applications (like the two) or undefined (e.g. `sigpause()` may be used only in ST apps/shouldn't be used at all).
Dummy00001
+1  A: 

About the only way how to handle signals well in multithreaded application is to do the following:

  1. Block all signals in main() early, before any other threads are spawned, using pthread_sigmask().
  2. Spawn a signals handling thread. Use sigwait() or sigwaitinfo() to handle the signals in a simple loop.

This way no threads except the one dedicated for signal handling will get the signals. Also, since the signal delivery is synchronous this way, you can use any inter-thread communication facilities you have, unlike inside classic signal handlers.

wilx