views:

1065

answers:

2

I'm having a weird problem with some python processes running using a watchdog process.

The watchdog process is written in python and is the parent, and has a function called *start_child(name)* which uses subprocess.Popen to open the child process. The Popen object is recorded so that the watchdog can monitor the process using poll() and eventually end it with terminate() when needed. If the child dies unexpectedly, the watchdog calls *start_child(name)* again and records the new Popen object.

There are 7 child processes, all of which are also python. If I run any of the children manually, I can send SIGTERM or SIGINT using kill and get the results I expect (the process ends).

However, when run from the watchdog process, the child will only end after the FIRST signal. When the watchdog restarts the child, the new child process no longer responds to SIGTERM or SIGINT. I have no idea what is causing this.

watchdog.py

class watchdog:
 # <snip> various init stuff

 def start(self):
  self.running = true

  kids = ['app1', 'app2', 'app3', 'app4', 'app5', 'app6', 'app7']
  self.processes = {}

  for kid in kids:
   self.start_child(kid)

  self.thread = threading.Thread(target=self._monitor)
  self.thread.start()

  while self.running:
   time.sleep(10)

 def start_child(self, name):
  try:
   proc = subprocess.Popen(name)
   self.processes[name] = proc
  except:
   print "oh no"
  else:
   print "started child ok"

 def _monitor(self):
  while self.running:
   time.sleep(1)
   if self.running:
    for kid, proc in self.processes.iteritems():
     if proc.poll() is not None: # process ended
      self.start_child(kid)

So what happens is watchdog.start() launches all 7 processes, and if I send any process SIGTERM, it ends, and the monitor thread starts it again. However, if I then send the new process SIGTERM, it ignores it.

I should be able to keep sending kill -15 to the restarted processes over and over again. Why do they ignore it after being restarted?

A: 

As explained here: http://blogs.gentoo.org/agaffney/2005/03/18/python_sucks , when Python creates a new thread, it blocks all signals for that thread (and for any processes that thread spawns).

I fixed this using sigprocmask, called through ctypes. This may or may not be the "correct" way to do it, but it does work.

In the child process, during __init__:

libc = ctypes.cdll.LoadLibrary("libc.so")
mask = '\x00' * 17 # 16 byte empty mask + null terminator 
libc.sigprocmask(3, mask, None) # '3' on FreeBSD is the value for SIG_SETMASK
gdm
Mixing any two of fork/exec, threads, and signals is difficult to get right. Mixing all three is a recipe for disaster.
Miles
Did I mention that the watchdog process itself is a daemon process which forks several times in order to detach itself? A _delicious_ disaster.
gdm
sigprocmask() is now scheduled for Python 3.2: <http://bugs.python.org/issue8407>
Martin Carpenter
A: 

Wouldn't it be better to restore the default signal handlers within Python rather than via ctypes? In your child process, use the signal module:

import signal
for sig in range(1, signal.NSIG):
    try:
        signal.signal(sig, signal.SIG_DFL)
    except RuntimeError:
        pass

RuntimeError is raised when trying to set signals such as SIGKILL which can't be caught.

mhawke
This doesn't work because all signals are masked. Regardless of what you do with signal.signal(), the process will never receive the signal. I do actually use signal.signal() to set my handlers for SIGTERM (so I can clean up on quit), but you still need to use sigprocmask to allow the process to see SIGTERM.
gdm
@gdm: Sorry about that, I don't know of any way to do this in Python, so calling out via ctype is probably the only way.
mhawke