tags:

views:

339

answers:

2

I am writing a system-critical program for a Linux distribution that I am developing. It needs to restart itself on receiving certain signals, to try to avoid crashing. The problem is, after the restart, I cannot re-enable that signal. That is, the signal cannot be received twice. After execv()'ing itself, when the new process calls signal() to set up the signal, SIG_DFL is returned. Every time. Even if I call it twice in a row -- indicating that it was never set in the first place. Is some weird flag being carried over from the original process?

+1  A: 

Signal handlers aren't inherited across exec because exec overwrites your whole address space, and any signal handlers that aren't reset would then be pointing to the wrong place. The only time it's not reset is if it's set to, say, SIG_IGN, which is not dependent on the address space of the pre-exec process.

Chris Jester-Young
That's what I would think. What I was saying is that after execv'ing, the new process can*not* set the signal again -- not that it doesn't carry over. No matter what I do, after execv'ing a process from a signal handler, I cannot set that same signal again.
c4757p
Do you have some code on hand that I can test with? I'm really curious about this now.
Chris Jester-Young
Sure. Try to 'kill -10' it. You can only do it once -- it ignores all subsequent attempts. http://www.box.net/shared/tgfaef5l8i
c4757p
Thanks for that. I also, too, wrote my own implementation (before your last comment), and I've observed the same (with SIGSEGV, in my case): http://codepad.org/ifh53S3R So, I guess further kernel exploration is required....
Chris Jester-Young
I'm going to take a peek at the code for sysvinit -- I believe this has a similar re-exec system.
c4757p
sysvinit did the exact same thing I was trying to do, and did it right. Apparently it required using the POSIX signal function, sigaction(), instead of the ANSI one, signal(). This one works: http://www.box.net/shared/mdcvz5l5nz
c4757p
+4  A: 

You are falling foul of the fact that you are essentially trying to recursively handle a signal.

When using signal() to register a signal handler, that signal number is blocked until the signal handler returns - in effect the kernel / libc blocks that signal number when the signal handler is invoked, and unblocks it after the signal handler returns. As you never return from the signal handler (instead you execl a new binary), SIGUSR1 stays blocked and so isn't caught the 2nd time.

This can be seen by examining /proc/</pid>/status before and after you send the first SIGUSR1.

Before:

$ cat /proc/<pid>/status | grep -E "Sig(Cgt|Blk)"
SigBlk: 0000000000000000
SigCgt: 0000000000000200

After:

$ cat /proc/<pid>/status | grep -E "Sig(Cgt|Blk)"
SigBlk: 0000000000000200
SigCgt: 0000000000000200

Note that SigCgt indicates signal 10 is registered (the number is a bitfield; 10th bit is set, which equates to SIGUSR1, see man signal(7) for the numbers). SigBlk is empty before SIGUSR is sent to your process, but after sending the signal it contains SIGUSR1.

You have two ways to solve this:

a). Manually unblock SIGUSR before calling execl in sighandler:

sigset_t sigs;
sigprocmask(0, 0, &sigs);
sigdelset(&sigs, SIGUSR1);
sigprocmask(SIG_SETMASK, &sigs);

b). Use sigaction with the SA_NODEFER flag instead of signal to register the signal handler. This will prevent SIGUSR1 from being blocked inside the signal handler:

struct sigaction act;
act.sa_handler = signalhandler;
act.sa_mask = 0;
act.sa_flags = SA_NODEFER;
sigaction(SIGUSR1, &act, 0);
Dave Rigby
I discovered that I had to unblock it -- I was idiotically assuming that it didn't block. Anyway, I tried NODEFER, and that didn't work, but unblocking it in the handler before the exec call works. Thank you.
c4757p