views:

1753

answers:

5

Hello,

I've recently needed to write a script that performs an os.fork() to split into two processes. The child process becomes a server process and pases data back to the parent process using a pipe created with os.pipe(). The child closes the 'r' end of the pipe and the parent closes the 'w' end of the pipe, as usual. I convert the returns from pipe() into file objects with os.fdopen.

The problem I'm having is this: The process successfully forks, and the child becomes a server. Everything works great and the child dutifully writes data to the open 'w' end of the pipe. Unfortunately the parent end of the pipe does two strange things: A) It blocks on the read() operation on the 'r' end of the pipe. Secondly, it fails to read any data that was put on the pipe unless the 'w' end is entirely closed.

I immediately thought that buffering was the problem and added pipe.flush() calls, but these didn't help.

Can anyone shed some light on why the data doesn't appear until the writing end is fully closed? And is there a strategy to make the read() call non blocking?

This is my first Python program that forked or used pipes, so forgive me if I've made a simple mistake.

A: 

The "parent" vs. "child" part of fork in a Python application is silly. It's a legacy from 16-bit unix days. It's an affectation from a day when fork/exec and exec were Important Things to make the most of a tiny little processor.

Break your Python code into two separate parts: parent and child.

The parent part should use subprocess to run the child part.

A fork and exec may happen somewhere in there -- but you don't need to care.

S.Lott
The "parent" vs "child" thing is part of the essential semantics of starting a subprocess. One is the subprocess, and the other isn't.
Charlie Martin
While true that fork creates a parent and a child, it isn't essential for creating a subprocess. Open VMS does not work that way. The subprocess module is far simpler than this fork malarkey.
S.Lott
@S.Lott: Don't think of `fork()` as being the equivalent of `CreateProcess()` in Windows, or the equivalent in VMS, which is basically what the subprocess module is modeled after. `fork()` is much more like starting a new thread, except that the thread happens to have a different process space (and so you need to communicate with it via pipes instead of shared memory). Using the `subprocess` module you need to run through the process initialization (such as parsing config files or command-line arguments) twice, while with `fork()` you don't. As such, `fork()` can be much more efficient.
Daniel Pryden
+2  A: 

Here's some example code for doing just this.

Charlie Martin
This is the site I based my code off of originally. Thanks
Paradox
+2  A: 

Using

fcntl.fcntl(readPipe, fcntl.F_SETFL, os.O_NONBLOCK)

Before invoking the read() solved both problems. The read() call is no longer blocking and the data is appearing after just a flush() on the writing end.

Paradox
+3  A: 

I see you have solved the problem of blocking i/o and buffering.

A note if you decide to try a different approach: subprocess is the equivalent / a replacement for the fork/exec idiom. It seems like that's not what you're doing: you have just a fork (not an exec) and exchanging data between the two processes -- in this case the multiprocessing module (in Python 2.6+) would be a better fit.

dF
This module looks very interesting. Thanks, I'll check it out.
Paradox
+1 for mentioning the difference between `fork()` (what the OP is trying to do here) and the `fork`/`exec` idiom encapsulated by the subprocess module, which is something completely different.
Daniel Pryden
+5  A: 

Are you using read() without specifying a size, or treating the pipe as an iterator (for line in f)? If so, that's probably the source of your problem - read() is defined to read until the end of the file before returning, rather than just read what is available for reading. That will mean it will block until the child calls close().

In the example code linked to, this is OK - the parent is acting in a blocking manner, and just using the child for isolation purposes. If you want to continue, then either use non-blocking IO as in the code you posted (but be prepared to deal with half-complete data), or read in chunks (eg r.read(size) or r.readline()) which will block only until a specific size / line has been read. (you'll still need to call flush on the child)

It looks like treating the pipe as an iterator is using some further buffer as well, for "for line in r:" may not give you what you want if you need each line to be immediately consumed. It may be possible to disable this, but just specifying 0 for the buffer size in fdopen doesn't seem sufficient.

Heres some sample code that should work:

import os, sys, time

r,w=os.pipe()
r,w=os.fdopen(r,'r',0), os.fdopen(w,'w',0)

pid = os.fork()
if pid:          # Parent
    w.close()
    while 1:
        data=r.readline()
        if not data: break
        print "parent read: " + data.strip()
else:           # Child
    r.close()
    for i in range(10):
        print >>w, "line %s" % i
        w.flush()
        time.sleep(1)
Brian