views:

535

answers:

2

I am writing a daemon program that spawns several other children processes. After I run the stop script, the main process keeps running when it's intended to quit, this really confused me.

import daemon, signal
from multiprocessing import Process, cpu_count, JoinableQueue
from http import httpserv
from worker import work

class Manager:
    """
    This manager starts the http server processes and worker
    processes, creates the input/output queues that keep the processes
    work together nicely.
    """
    def __init__(self):
        self.NUMBER_OF_PROCESSES = cpu_count()

    def start(self):
        self.i_queue = JoinableQueue()
        self.o_queue = JoinableQueue()

        # Create worker processes
        self.workers = [Process(target=work,
                                args=(self.i_queue, self.o_queue))
                        for i in range(self.NUMBER_OF_PROCESSES)]
        for w in self.workers:
            w.daemon = True
            w.start()

        # Create the http server process
        self.http = Process(target=httpserv, args=(self.i_queue, self.o_queue))
        self.http.daemon = True
        self.http.start()

        # Keep the current process from returning
        self.running = True
        while self.running:
            time.sleep(1)

    def stop(self):
        print "quiting ..."

        # Stop accepting new requests from users
        os.kill(self.http.pid, signal.SIGINT)

        # Waiting for all requests in output queue to be delivered
        self.o_queue.join()

        # Put sentinel None to input queue to signal worker processes
        # to terminate
        self.i_queue.put(None)
        for w in self.workers:
            w.join()
        self.i_queue.join()

        # Let main process return
        self.running = False


import daemon

manager = Manager()
context = daemon.DaemonContext()
context.signal_map = {
        signal.SIGHUP: lambda signum, frame: manager.stop(),
        }

context.open()
manager.start()

The stop script is just a one-liner os.kill(pid, signal.SIGHUP), but after that the children processes (worker processes and http server process) end nicely, but the main process just stays there, I don't know what keeps it from returning.

+1  A: 

I tried a different approach, and this seems to work (note I took out the daemon portions of the code as I didn't have that module installed).

import signal

class Manager:
    """
    This manager starts the http server processes and worker
    processes, creates the input/output queues that keep the processes
    work together nicely.
    """
    def __init__(self):
        self.NUMBER_OF_PROCESSES = cpu_count()

    def start(self):

       # all your code minus the loop

       print "waiting to die"

       signal.pause()

    def stop(self):
        print "quitting ..."

        # all your code minus self.running


manager = Manager()

signal.signal(signal.SIGHUP, lambda signum, frame: manager.stop())

manager.start()

One warning, is that signal.pause() will unpause for any signal, so you may want to change your code accordingly.

EDIT:

The following works just fine for me:

import daemon
import signal
import time

class Manager:
    """
    This manager starts the http server processes and worker
    processes, creates the input/output queues that keep the processes
    work together nicely.
    """
    def __init__(self):
        self.NUMBER_OF_PROCESSES = 5

    def start(self):

       # all your code minus the loop

       print "waiting to die"
       self.running = 1
       while self.running:
           time.sleep(1)

       print "quit"



    def stop(self):
        print "quitting ..."

        # all your code minus self.running

        self.running = 0


manager = Manager()

context = daemon.DaemonContext()
context.signal_map = {signal.SIGHUP : lambda signum, frame: manager.stop()}

context.open()
manager.start()

What version of python are you using?

grieve
I don't do the signal handling manually myself, it's dealt with the daemon module from http://pypi.python.org/pypi/python-daemon/
btw0
+1  A: 

You create the http server process but don't join() it. What happens if, rather than doing an os.kill() to stop the http server process, you send it a stop-processing sentinel (None, like you send to the workers) and then do a self.http.join()?

Update: You also need to send the None sentinel to the input queue once for each worker. You could try:

    for w in self.workers:
        self.i_queue.put(None)
    for w in self.workers:
        w.join()

N.B. The reason you need two loops is that if you put the None into the queue in the same loop that does the join(), that None may be picked up by a worker other than w, so joining on w will cause the caller to block.

You don't show the code for workers or http server, so I assume these are well-behaved in terms of calling task_done etc. and that each worker will quit as soon as it sees a None, without get()-ing any more things from the input queue.

Also, note that there is at least one open, hard-to-reproduce issue with JoinableQueue.task_done(), which may be biting you.

Vinay Sajip
The while true loop in my code was indeed self.http.join(), but one process didn't quit, then I replaced it with a while true loop, the process still not quitting :(
btw0
See my update. A modified version of your script with my changes terminates correctly, at least in my environment.
Vinay Sajip