views:

881

answers:

4

I have some commands which I am running using the subprocess module. I then want to loop over the lines of the output. The documentation says do not do data_stream.stdout.read which I am not but I may be doing something which calls that. I am looping over the output like this:

for line in data_stream.stdout:
   #do stuff here
   .
   .
   .

Can this cause deadlocks like reading from data_stream.stdout or are the Popen modules set up for this kind of looping such that it uses the communicate code but handles all the callings of it for you?

A: 

data_stream.stdout is a standard output handle. you shouldn't be looping over it. communicate returns tuple of (stdoutdata, stderr). this stdoutdata you should be using to do your stuff.

SilentGhost
+2  A: 

You have to worry about deadlocks if you're communicating with your subprocess, i.e. if you're writing to stdin as well as reading from stdout. Because these pipes may be cached, doing this kind of two-way communication is very much a no-no:

data_stream = Popen(mycmd, stdin=PIPE, stdout=PIPE)
data_stream.stdin.write("do something\n")
for line in data_stream:
  ...  # BAD!

However, if you've not set up stdin (or stderr) when constructing data_stream, you should be fine.

data_stream = Popen(mycmd, stdout=PIPE)
for line in data_stream.stdout:
   ...  # Fine

If you need two-way communication, use communicate.

chrispy
Almost right, but, as the docs say about `communicate`, "The data read is buffered in memory, so do not use this method if the data size is large or unlimited."
Alex Martelli
So if I never mess with stderr or stdin than looping in that fashion is fine.
Matt
+2  A: 

SilentGhost's/chrispy's answers are OK if you have a small to moderate amount of output from your subprocess. Sometimes, though, there may be a lot of output - too much to comfortably buffer in memory. In such a case, the thing to do is start() the process, and spawn a couple of threads - one to read child.stdout and one to read child.stderr where child is the subprocess. You then need to wait() for the subprocess to terminate.

This is actually how communicate() works; the advantage of using your own threads is that you can process the output from the subprocess as it is generated. For example, in my project python-gnupg I use this technique to read status output from the GnuPG executable as it is generated, rather than waiting for all of it by calling communicate(). You are welcome to inspect the source of this project - the relevant stuff is in the module gnupg.py.

Vinay Sajip
+4  A: 

The two answer have caught the gist of the issue pretty well: don't mix writing something to the subprocess, reading something from it, writing again, etc -- the pipe's buffering means you're at risk of a deadlock. If you can, write everything you need to write to the subprocess FIRST, close that pipe, and only THEN read everything the subprocess has to say; communicate is nice for the purpose, IF the amount of data is not too large to fit in memory (if it is, you can still achieve the same effect "manually").

If you need finer-grain interaction, look instead at pexpect or, if you're on Windows, wexpect.

Alex Martelli