views:

280

answers:

2

Python documentation to Popen states:

Warning Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.

Now, I'm trying to figure out how this deadlock can occur and why.

My mental model: subproccess produces something to stdout/err, which is buffered and after buffer is filled, it's flushed to stdout/err of subproccess, which is send through pipe to parent process.

From what documentation states, pipe has it's own buffer and when it's filled or subprocess terminates, it's flushed to to the parent process.

Either way (with pipe buffer or not), I'm not entirely sure how deadlock can occur. Only thing I can think of is some kind of "global" OS pipe buffer processes will be striving for, which sounds strange. Another is that more processes will share same pipe, which should not happen on it's own.

Can someone please explain this?

+1  A: 

A deadlock can occur when both buffers (stdin and stdout) are full: your program is waiting to write more input to the external program, and the external program is waiting for you to read from its output buffer first.

This can be solved by using non-blocking I/O and properly prioritizing the buffers. You can try to make it work yourself, but communicate() just does that for you.

Wim
Yup, but we're talking about reading from larger-then-memory-size data. But thanks for clarification.
Almad
+2  A: 

Careful, this has a subtle mistake in it.

My mental model: subproccess produces something to stdout/err, which is buffered and after buffer is filled, it's flushed to stdout/err of subproccess, which is send through pipe to parent process.

The buffer is shared by parent and child process.

Subprocess produces something to stdout, which is the same buffer the parent process is supposed to be reading from.

When the buffer is filled, writing stops until the buffer is emptied. Flush doesn't mean anything to a pipe, since two processes share the same buffer.

Flushing to disk means that the device driver must push the bytes down to the device. Flushing a socket means to tell TCP/IP to stop waiting to accumulate a buffer and send stuff. Flushing to a console means stop waiting for a newline and push the bytes through the device driver to the device.

S.Lott
This is what I was not so sure about. Thanks.
Almad