tags:

views:

1217

answers:

2

The example code of section 10.6, the expected result is:
after several iterations, the static structure used by getpwnam will be corrupted, and the program will terminate with SIGSEGV signal.

But on my platform, Fedora 11, gcc (GCC) 4.4.0, the result is

[Langzi@Freedom apue]$ ./corrupt
in sig_alarm

I can see the output from sig_alarm only once, and the program seems hung up for some reason, but it does exist, and still running.
But when I try to use gdb to run the program, it seems OK, I will see the output from sig_alarm at regular intervals.

And from my manual, it said the signal handler will be set to SIG_DEF after the signal is handled, and system will not block the signal. So at the beginning of my signal handler I reset the signal handler.

Maybe I should use sigaction instead, but I only want to know the reason about the difference between normal running and gdb running.

Any advice and help will be appreciated.

following is my code:

#include "apue.h"
#include <pwd.h>

void sig_alarm(int signo);

int main()
{
  struct passwd *pwdptr;
  signal(SIGALRM, sig_alarm);

  alarm(1);
  for(;;) {
    if ((pwdptr = getpwnam("Zhijin")) == NULL)
      err_sys("getpwnam error");
    if (strcmp("Zhijin", pwdptr->pw_name) != 0) {
      printf("data corrupted, pw_name: %s\n", pwdptr->pw_name);
    }
  }
}

void sig_alarm(int signo)
{
  signal(SIGALRM, sig_alarm);
  struct passwd *rootptr;
  printf("in sig_alarm\n");

  if ((rootptr = getpwnam("root")) == NULL)
    err_sys("getpwnam error");
  alarm(1);
}
+1  A: 

According to the standard, you're really not allowed to do much in a signal handler. All you are guaranteed to be able to do in the signal-handling function, without causing undefined behavior, is to call signal, and to assign a value to a volatile static object of type the type sig_atomic_t.

The first few times I ran this program, on Ubuntu Linux, it looked like your call to alarm in the signal handler didn't work, so the loop in main just kept running after the first alarm. When I tried it later, the program ran the signal handler a few times, and then hung. All this is consistent with undefined behavior: the program fails, sometimes, and in various more or less interesting ways.

It is not uncommon for programs that have undefined behavior to work differently in the debugger. The debugger is a different environment, and your program and data could for example be laid out in memory in a different way, so errors can manifest themselves in a different way, or not at all.

I got the program to work by adding a variable:

volatile sig_atomic_t got_interrupt = 0;

And then I changed your signal handler to this very simple one:

void sig_alarm(int signo) {
    got_interrupt = 1;
}

And then I inserted the actual work into the infinite loop in main:

if (got_interrupt) {
    got_interrupt = 0;
    signal(SIGALRM, sig_alarm);
    struct passwd *rootptr;
    printf("in sig_alarm\n");

    if ((rootptr = getpwnam("root")) == NULL)
        perror("getpwnam error");
    alarm(1);
}

I think the "apue" you mention is the book "Advanced Programming in the UNIX Environment", which I don't have here, so I don't know if the purpose of this example is to show that you shouldn't mess around with things inside of a signal handler, or just that signals can cause problems by interrupting the normal work of the program.

Thomas Padron-McCarthy
Thanks for your reply. The purpose of the example is to show if we call a nonreentrant function from a signal handler, the results are unpredictable. Nonreentrant function: a) use static data structures, b)call malloc or free, c)use standard I/O library, as the library use global data structures in a nonreentrant way. And we should save and restore errno.The function getpwnam use static structure, so in the main loop, the calling may find the internal data pointer has been corrupted when the signal handler call the same function, and the program will crash.And
OnTheEasiestWay
And following your advice, the call to alarm is OK. But the expected result of the example can't appear. The example is to demonstrate the wrong way about signal handler, so I just want to know the reason why it is wrong. Thanks again.
OnTheEasiestWay
Yes, the example shows that you get undefined behavior. I assume that the intention was that the string **pw_name** should sometimes turn out to be wrong. But it seems to me that it actually has shown more than that, namely that undefined behavior is just that, undefined, and if you expect it to do a specific thing, it might instead do something completely different. Just be glad it didn't melt the computer, or send rude e-mail to your mother!
Thomas Padron-McCarthy
+1  A: 

According to the spec, the function getpwnam is not reentrant and is not guaranteed to be thread safe. Since you are accessing the structure in two different threads of control (signal handlers are effectively running in a different thread context), you are running into this issue. Whenever you have concurrent or parallel execution (as when using pthreads or when using a signal handler), you must be sure not to modify shared state (e.g. the structure owned by 'getpwnam'), and if you do, then appropriate locking/synchronization must be used.

Additionally, the signal function has been deprecated in favor of the sigaction function. In order to ensure portable behavior when registering signal handlers, you should always use the sigaction invocation.

Using the sigaction function, you can use the SA_RESETHAND flag to reset the default handler. You can also use the sigprocmask function to enable/disable the delivery of signals without modifying their handlers.

Michael Aaron Safyan