views:

498

answers:

4

I would like to be able to spawn a process in python and have two way communication. Of course, Pexpect does this and is indeed a way I might go. However, it is not quite ideal.

My ideal situation would be to have a cross platform generic technique that involved only the standard python libraries. Subprocess gets pretty close, but the fact that I have to wait for the process to terminate before safely interacting with it is not desirable.

Looking at the documentation, it does say there is a stdin,stdout and stderr file descriptors that I can directly manipulate, but there is a big fat warning that says "Don't Do This". Unfortunately its not entirely clear why this warning exists, but from what I gather from google is that it is related to os buffering, and it is possible to write code that unexpectedly deadlocks when those internal buffers fail (as a side note, any examples that show the wrong way and right way would be appreciated).

So, risking my code to potential deadlocks, I thought it might be interesting to use poll or select to interactively read from the running process without killing it. Although I lose (i think) the cross platform ability, I like the fact that it requires no additional libraries. But more importantly, I would like to know if this is this a good idea. I have yet to try this approach, but I am concerned about gotchas that could potentially devastate my program. Can it work? What should I test for?

In my specific case I am not really concerned about being able to write to the process, just repeatedly reading from it. Also, I don't expect my processes to dump huge amounts of text, so I hope to avoid the deadlocking issue, however I would like to know exactly what those limits are and be able to write some tests to see where it breaks down.

A: 

The short answer is that there is no such thing as a good cross platform system for process management, without designing that concept into your system. This is especially in the standar libraries. Even the various unix versions have their own compatibility issues.

Your best bet is to instrument all the processes with the proper event handling to notice events that come in from whatever IPC system works best on whatever platform. Named pipes will be the general route for the problem you describe, but there will be implementation differences on each platform.

drudru
+2  A: 

Use the multiprocessing module in the Python 2.6 standard library.

It has a Queue class that can be used for both reading and writing.

Seun Osewa
I upvoted this having recently build a project based on this module. There are a number of gotchas though if doing cross-platform dev, such as Windows' lack of a fork implementation, which in turn means that this is emulated in Python with some odd side-effects. However as the poster stated, it gives you nice fluffy ways to communicate with your child process and makes things very easy in general.
jkp
+1  A: 

I do this in a separate thread, using message queues to communicate between the threads. In my case the subprocess prints % complete to stdout. I wanted the main thread to put up a pretty progress bar.

 if sys.platform == 'win32':
        self.shell = False
        self.startupinfo = subprocess.STARTUPINFO()
        self.startupinfo.dwFlags = 0x01
        self.startupinfo.wShowWindow = 0
    else:
        self.shell = True
        self.startupinfo = None

. . .

f = subprocess.Popen( cmd, stdin=subprocess.PIPE, stderr=subprocess.PIPE, stdout=subprocess.PIPE, env = env, shell = self.shell, startupinfo = self.startupinfo )
    f.stdin.close()
    line = ''
    while True:
        log.debug('reading')
        c = f.stdout.read(1)

        log.debug(c)

        if len(c) == 0:
            log.info('stdout empty; must be done')
            break;
        if ord(c) == 13:
            continue
        if c == '%':
            # post % complete message to waiting thread.
            line = ''
        else:
            line += c


    log.info('checking for errors')
    errs = f.stderr.readlines()

    if errs:
        prettyErrs = 'Reported Errors: '
        for i in errs:
            prettyErrs += i.rstrip('\n')

        log.warn( prettyErrs )
        #post errors to waiting thread
    else:
        print 'done'        
    return
Norman
I worked on this over the weekend, and that is close to the solution I came up with. It has nothing to do with polling for data being ready and everything to do with stdio being a buffered stream. In my particular case, I can get away with assuming that stdout will be flushed regularly by the executed script. Therefore, a simple blocking readline running in a separate thread does everything I need. If someone has working example of how to unbuffer stdio, I would be extremely interested.
Voltaire
er, by stdio I mean stdout
Voltaire
A: 

Forgive my ignorance on this topic, but couldn't you just launch python with the -u flag for "unbuffered"?

This might also be of interest... http://www.gossamer-threads.com/lists/python/python/658167