views:

938

answers:

3

I have some processes showing up as <defunct> in top (and ps). I've boiled things down from the real scripts and programs.

In my crontab:

* * * * * /tmp/launcher.sh /tmp/tester.sh

The contents of launcher.sh (which is of course marked executable):

#!/bin/bash
# the real script does a little argument processing here
"$@"

The contents of tester.sh (which is of course marked executable):

#!/bin/bash
sleep 27 & # the real script launches a compiled C program in the background

ps shows the following:

user       24257 24256  0 18:32 ?        00:00:00 [launcher.sh] <defunct>
user       24259     1  0 18:32 ?        00:00:00 sleep 27

Note that tester.sh does not appear--it has exited after launching the background job.

Why does launcher.sh stick around, marked <defunct>? It only seems to do this when launched by cron--not when I run it myself.

Additional note: launcher.sh is a common script in the system this runs on, which is not easily modified. The other things (crontab, tester.sh, even the program that I run instead of sleep) can be modiified much more easily.

+3  A: 

Because they haven't been the subject of a wait(2) system call.

Since someone may wait for these processes in the future, the kernel can't completely get rid of them or it won't be able to execute the wait system call because it won't have the exit status or evidence of its existence any more.

When you start one from the shell, your shell is trapping SIGCHLD and doing various wait operations anyway, so nothing stays defunct for long.

But cron isn't in a wait state, it is sleeping, so the defunct child may stick around for a while until cron wakes up.


Update:   Responding to comment... Hmm. I did manage to duplicate the issue:

 PPID   PID  PGID  SESS COMMAND
    1  3562  3562  3562 cron
 3562  1629  3562  3562  \_ cron
 1629  1636  1636  1636      \_ sh <defunct>
    1  1639  1636  1636 sleep

So, what happened was, I think:

  • cron forks and cron child starts shell
  • shell (1636) starts sid and pgid 1636 and starts sleep
  • shell exits, SIGCHLD sent to cron 3562
  • signal is ignored or mishandled
  • shell turns zombie. Note that sleep is reparented to init, so when the sleep exits init will get the signal and clean up. I'm still trying to figure out when the zombie gets reaped. Probably with no active children cron 1629 figures out it can exit, at that point the zombie will be reparented to init and get reaped. So now we wonder about the missing SIGCHLD that cron should have processed.
    • It isn't necessarily vixie cron's fault. As you can see here, libdaemon installs a SIGCHLD handler during daemon_fork(), and this could interfere with signal delivery on a quick exit by intermediate 1629

      Now, I don't even know if vixie cron on my Ubuntu system is even built with libdaemon, but at least I have a new theory. :-)

DigitalRoss
It actually will stick around all day, not just until cron wakes up. Can you comment on that? The real program I run (not sleep) runs for hours and hours.
John Zwinck
+3  A: 

I’d recommend that you solve the problem by simply not having two separate processes: Have launcher.sh do this on its last line:

exec "$@"

This will eliminate the superfluous process.

Teddy
I think you're right, but I can't easily do that because `launcher.sh` is used by many things, some of which would break if I made this change. I *might* consider making a new launcher script that does exec and leaving the other version intact, but this is rather distasteful.
John Zwinck
@John Zwinck: I cannot imagine in what circumstances things would break if you made this change. It's effectively the same thing with one less process.
Teddy
@Teddy: the thing that would break is that some people do this in an interactive shell: `. launcher.sh foo bar` If the launcher did `exec`, the user's shell would terminate upon completion of the launched program. I know it's a strange use case, but that's how it is in the existing system.
John Zwinck
@John Zwinck: The script could be rewritten to detect if it was started or sourced, and act accordingly.
Teddy
+2  A: 

I suspect that cron is waiting for all subprocesses in the session to terminate. See wait(2) with respect to negative pid arguments. You can see the SESS with:

ps faxo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm

Here's what I see (edited):

STAT  EUID  RUID TT       TPGID  SESS  PGRP  PPID   PID %CPU COMMAND
Ss       0     0 ?           -1  3197  3197     1  3197  0.0 cron
S        0     0 ?           -1  3197  3197  3197 18825  0.0  \_ cron
Zs    1000  1000 ?           -1 18832 18832 18825 18832  0.0      \_ sh <defunct>
S     1000  1000 ?           -1 18832 18832     1 18836  0.0 sleep

Notice that the sh and the sleep are in the same SESS.

Use the command setsid(1). Here's tester.sh:

#!/bin/bash
setsid sleep 27 # the real script launches a compiled C program in the background

Notice you don't need &, setsid puts it in the background.

bstpierre
Doing this causes `launcher.sh` and `tester.sh` to stick around. I'd like them both to terminate (at least with my original situation, `tester.sh` does terminate--with `setsid` it doesn't, which I don't want).
John Zwinck
That's odd, both launcher and tester terminate when I run it here. (Almost immediately -- I have yet to take a ps snapshot where I see them running.)
bstpierre
I am using Ubuntu Hardy 64-bit. What about you?
John Zwinck
Oh, and I have `SHELL=/bin/bash` at the top of my `crontab`.
John Zwinck
Ubuntu jaunty 32. No bash in my crontab. cron 3.0pl1-105ubuntu1.1
bstpierre