views:

80

answers:

1

I have a quite simple perl script, that in one function does the following:

    if ( legato_is_up() ) {
        write_log("INFO:        Legato is up and running. Continue the installation.");
        $wait_minutes = $WAITPERIOD + 1;
        $legato_up = 1;
    }
    else {
        my $towait = $WAITPERIOD - $wait_minutes;
        write_log("INFO:        Legato is not up yet. Waiting for another $towait minutes...");
        sleep 30;
        $wait_minutes = $wait_minutes + 0.5;
    }

For some reason, sometimes (like 1 in 3 runs) the script gets killed. I don't know who's responsible for the kill, I just know it happens during the "sleep" call.

Can anyone give me a hint here? After script is killed, it's job is not done, which is a big problem.

Thanks.

+1  A: 

Without knowing what else is running on your system, it's anybody's guess. You could add a signal handler, but all that it would tell you is which signal it was (and when), but not who sent it:

foreach my $signal (qw(INT PIPE HUP))
{
    my $old_handler = $SIG{$signal};
    $SIG{$signal} = sub {
        print time, ": ", $signal, " received!\n";
        $old_handler->(@_) if $old_handler;
    };
}

You also may want to consider adding a WARN and DIE handler, if you are not logging output from stderr.

Ether
there's nothing else. I have a number of other scripts using sleep call, but only this one gets killed...
Alex
Isn't there a Perl module for advanced signal handling, which *can* tell you who sent the signal?
Zan Lynx
@Zan I don't think Unix makes that information available. Glancing through the GNU C Library docs on signal handling their handler just gets the signal number. http://www.gnu.org/s/libc/manual/html_node/Basic-Signal-Handling.html
Schwern
@Alex: I can guarantee you that there are other processes running on your system :) Perhaps not by you, but root always has several.
Ether
@Schwern: Look at the man page for sigaction. There's a big section on siginfo_t. One of the struct members is si_pid. You get this extra info with the SA_SIGINFO flag.
Zan Lynx
@Ether: oh yes, definitely, but I mean this is a normal system, nothing should kill the script. A script being always killed during its sleep() call and nothing else is strange, wouldn't you say?
Alex
@Alex: I would guardedly say yes (guardedly only because I know nothing about your system). Strange things can happen though; e.g. processes can mysteriously vanish if you run out of memory.
Ether
@Ether: you were right after all :-) there's something wacky on the system that kills almost all processes that have anything to do with Legato (the clustering software) - running scripts were killed, tail on system log was killed, tail on my log was killed - weird.Now I need to look for the way to do it without killing the system :-)
Alex