views:

58

answers:

1

Hi,

I realize this might be a duplicate of http://stackoverflow.com/questions/1191374/subprocess-with-timeout. If it is, I apologize, just wanted to clarify something.

I'm creating a subprocess, which I want to run for a certain amount of time, and if it doesn't complete within that time, I want it to throw an error. Would something along the lines of the following code work or do we have to use a signal like answered in the other question? Thanks in advance!:

def run(self):
    self.runTestCmd()
    self.waitTestComplete(self.timeout)

def runTestCmd(self):
    self.proc = subprocess.Popen("./configure", shell=True)

def waitTestComplete(self, timeout):
    st = time.time() 
    while (time.time()-st) < timeout:
        if self.proc.poll() == 0:
            return True
        else:
            time.sleep(2)
    raise TestError("timed out waiting for test to complete")
+2  A: 

It would, but it has a problem. The process will continue on doing whatever it is you asked it to do even after you've given up on it. You'll have to send the process a signal to kill it once you've given up on it if you really want it to stop.

Since you are spawning a new process (./configure which is presumably a configure script) that in turn creates a whole ton of sub-processes this is going to get a little more complex.

import os

def runTestCmd(self):
    self.proc = subprocess.Popen(["./configure"], shell=False,
                                 preexec_fn=os.setsid)

Then os.kill(-process.pid, signal.SIGKILL) should kill all the sub-processes. Basically what you are doing is using the preexec_fn to cause your new subprocess to acquire it's own session group. Then you are sending a signal to all processes in that session group.

Many processes that spawn subprocesses know that they need to clean up their subprocesses before they die. So it behooves you to try being nice to them if you can. Try os.signal(-process.pid, signal.SIGTERM) first, wait a second or two for the process to exit, then try SIGKILL. Something like this:

import time, os, errno, signal

def waitTestComplete(self, timeout):
    st = time.time() 
    while (time.time()-st) < timeout:
        if self.proc.poll() is not None:  # 0 just means successful exit
            # Only return True if process exited successfully,
            # otherwise return False.
            return self.proc.returncode == 0
        else:
            time.sleep(2)
    # The process may exit between the time we check and the
    # time we send the signal.
    try:
        os.kill(-self.proc.pid, signal.SIGTERM)
    except OSError, e:
        if e.errno != errno.ESRCH:
            # If it's not because the process no longer exists,
            # something weird is wrong.
            raise
    time.sleep(1)
    if self.proc.poll() is None: # Still hasn't exited.
        try:
            os.kill(-self.proc.pid, signal.SIGKILL)
        except OSError, e:
            if e.errno != errno.ESRCH:
                raise
    raise TestError("timed out waiting for test to complete")

As a side note, never, ever use shell=True unless you know for absolute certain that's what you want. Seriously. shell=True is downright dangerous and the source of many security issues and mysterious behavior.

Omnifarious
Thanks for your reply :) I'm running python 2.5 though, which doesn't have pOpen.kill(), and was wondering how I could go about killing the subprocess. I tried http://stackoverflow.com/questions/1064335/in-python-2-5-how-do-i-kill-a-subprocess but that doesn't seem to work.
iman453
@user388025 - That is indeed how you would go about doing it. Except, it's configure, right? That means it spawns a ton of sub-processes, and some of those may take awhile to die off. In order to make that work you might have to do something lots more complicated so those processes end up with their own process group.
Omnifarious
@user388025 - You mean `["make", "test"]`. Never use `shell=True` unless you absolutely have to and know what you're on about. And yes, that will also spawn lots of subprocesses, and I edited my answer to answer your question. You might also want to try sending make the `SIGINTR` or `SIGTERM` signals instead of `SIGKILL` so it has a chance to try to clean up the subprocesses it spawns.
Omnifarious
Oops, deleted my comment accidentally. I'm going to try what you just said, thanks for your time :)
iman453
@user388025 - I edited my answer to solve a number of other problems you might end up encountering and fix up your code to not think a test failure is the same as a timeout.
Omnifarious
Omnifarious, what happens if configure exits after receiving the SIGTERM, but one or more of its children stubbornly remain behind? If I'm not mistaken, poll() would return an exit code and you would never send the SIGKILL signal to those remaining processes.
Marius Gedminas
@Omnifarious, it worked...thanks for the answer and the brilliant explanation :)
iman453
@Marius Gedminas: I believe you might be correct, and I'm not completely sure what to do about that. I would have to research what happens to the process group once the process group leader exits. Because if a new process becomes the group leader, and the process group acquires a new id (which I expect is the case) there's no real way to reference all of those processes anymore. You'd have to capture a process tree or something before the leader exited, and that might change over time.
Omnifarious
@Omnifatious, I'm a little confused as to how I can differentiate between a timeout error and a test error (how did you know I was going to run into this problem down the line? lol). In the code you've edited, (if self.proc.poll() is None: return self.proc.returncode == 0), doesn't checking if proc.poll() is None mean that the process is still running? If it's still running, how can proc.returncode be equal to 0, in the next line? Sorry if this is a stupid question!
iman453
@user388025 - I made a mistake. *sheepish grin* I should've had `self.proc.poll() is not None`. instead.
Omnifarious
@Omnifatious: Sweet..thanks :)
iman453