tags:

views:

208

answers:

4

I am running the following version of Python:

$ /usr/bin/env python --version                                                                                                                                                            
Python 2.5.2                                    

I am running the following Python code to write data from a child subprocess to standard output, and reading that into a Python variable called metadata:

# Extract metadata (snippet from extractMetadata.py)
inFileAsGzip = "%s.gz" % inFile                                                                                                                                                                                                            
if os.path.exists(inFileAsGzip):                                                                                                                                                                                                           
    os.remove(inFileAsGzip)                                                                                                                                                                                                                
os.symlink(inFile, inFileAsGzip)                                                                                                                                                                                                           
extractMetadataCommand = "bgzip -c -d -b 0 -s %s %s" % (metadataRequiredFileSize, inFileAsGzip)                                                                                                                                            
metadataPipes = subprocess.Popen(extractMetadataCommand, stdin=None, stdout=subprocess.PIPE, shell=True, close_fds=True)                                                                                                      
metadata = metadataPipes.communicate()[0]                                                                                                                                                                                                                                                                                                                                                                                                          
metadataPipes.stdout.close()                                                                                                                                                                                                             
os.remove(inFileAsGzip) 
print metadata

The use case is as follows, to pull the first ten lines of standard output from the aforementioned code snippet:

$ extractMetadata.py | head

The error will appear if I pipe into head, awk, grep, etc.

The script ends with the following error:

close failed: [Errno 32] Broken pipe

I would have thought closing the pipes would be sufficient, but obviously that's not the case.

+1  A: 

Hmmm. I've seen some "Broken pipe" strangeness with subprocess + gzip before. I never did figure out exactly why it was happening but by changing my implementation approach, I was able to avoid the problem. It looks like you're just trying to use a backend gzip process to decompress a file (probably because Python's builtin module is horrendously slow... no idea why but it definitely is).

Rather than using communicate() you can, instead, treat the process as a fully asynchronous backend and just read it's output as it arrives. When the process dies, the subprocess module will take care of cleaning things up for you. The following snippit should provide the same basic functionality without any broken pipe issues.

import subprocess

gz_proc = subprocess.Popen(['gzip', '-c', '-d', 'test.gz'], stdout=subprocess.PIPE)

l = list()
while True:
    dat = gz_proc.stdout.read(4096)
    if not d:
        break
    l.append(d)

file_data = ''.join(l)
Rakis
Thank you for your answer. I still get broken pipe errors with this approach. Perhaps Popen() and write() do not cooperate well with respect to piping output to a csh/bash shell.
Alex Reynolds
A: 

There's not enough information to answer this conclusively, but I can make some educated guesses.

First, os.remove should definitely not be failing with EPIPE. It doesn't look like it is, either; the error is close failed: [Errno 32] Broken pipe, not remove failed. It looks like close is failing, not remove.

It's possible for closing a pipe's stdout to give this error. If data is buffered, Python will flush the data before closing the file. If the underlying process is gone, doing this will raise IOError/EPIPE. However, note that this isn't a fatal error: even when this happens, the file is still closed. The following code reproduces this about 50% of the time, and demonstrates that the file is closed after the exception. (Watch out; I think the behavior of bufsize has changed across versions.)

    import os, subprocess
    metadataPipes = subprocess.Popen("echo test", stdin=subprocess.PIPE,
        stdout=subprocess.PIPE, shell=True, close_fds=True, bufsize=4096)
    metadataPipes.stdin.write("blah"*1000)
    print metadataPipes.stdin
    try:
        metadataPipes.stdin.close()
    except IOError, e:
        print "stdin after failure: %s" % metadataPipes.stdin

This is racy; it only happens part of the time. That may explain why it looked like removing or adding the os.remove call affects the error.

That said, I can't see how this would happen with the code you've provided, since you don't write to stdin. It's the closest I can get without a usable repro, though, and maybe it'll point you in the right direction.

As a side note, you shouldn't check os.path.exists before deleting a file that may not exist; it'll cause race conditions if another process deletes the file at the same time. Instead, do this:

try:
    os.remove(inFileAsGzip)
except OSError, e:
    if e.errno != errno.ENOENT: raise

... which I usually wrap in a function like rm_f.

Finally, if you explicitly want to kill a subprocess, there's metadataPipes.kill--just closing its pipes won't do that--but that doesn't help explain the error. Also, again, if you're just reading gzip files you're much better off with the gzip module than a subprocess. http://docs.python.org/library/gzip.html

Glenn Maynard
kill() is not available in 2.5.2. There is no stdin; I have edited the question to reflect this. I cannot use the gzip module for this task, although using the gzip binary reproduces the errors.
Alex Reynolds
The question doesn't make sense now: you're running the script through `head`, but the script has no output. Please provide a complete, self-contained repro; don't make us experiment with partial code that doesn't even execute as-is, trying to guess what you're talking about.
Glenn Maynard
Okay, never mind. Thanks for your help.
Alex Reynolds
I ask for more information and your response is "never mind"? Seriously?
Glenn Maynard
A: 

Getting the first 10 lines from a process output might work better this way:

ph = os.popen(cmdline, 'r')
lines = []
for s in ph:
    lines.append(s.rstrip())
    if len(lines) == 10: break
print '\n'.join(lines)
ph.close()
Vlad
What if I want to handle standard output differently? In other words, instead of using head, I pipe the output into an awk script. I am getting a broken pipe whenever I pipe the output somewhere else.
Alex Reynolds
If my above version of the script fails when piped through awk, then your problem has nothing to do with the subprocess. What am I missing?
Vlad
A: 

I would say this exception has nothing to do with the subprocess call nor its file descriptors (after a communicate call the popen object is completely closed). This seems to be the classic problem due to the closing of sys.stdout in a pipe:

http://bugs.python.org/issue1596

Despite being a 3-year old bug it has not been solved. Since sys.stdout.write(...) does not seem to help either, you can resort to the lower-level call:

os.write(sys.stdout.fileno(), metadata)
tokland