views:

1327

answers:

3

I have some data that I would like to gzip, uuencode and then print to standard out. What I basically have is:

compressor = Popen("gzip", stdin = subprocess.PIPE, stdout = subprocess.PIPE)
encoder    = Popen(["uuencode", "dummy"], stdin = compressor.stdout)

The way I feed data to the compressor is through compressor.stdin.write(stuff).

What I really need to do is to send an EOF to the compressor, and I have no idea how to do it.

At some point, I tried compressor.stdin.close() but that doesn't work -- it works well when the compressor writes to a file directly, but in the case above, the process doesn't terminate and stalls on compressor.wait().

Suggestions? In this case, gzip is an example and I really need to do something with piping the output of one process to another.

Note: The data I need to compress won't fit in memory, so communicate isn't really a good option here. Also, if I just run

compressor.communicate("Testing")

after the 2 lines above, it still hangs with the error

  File "/usr/lib/python2.4/subprocess.py", line 1041, in communicate
    rlist, wlist, xlist = select.select(read_set, write_set, [])
+1  A: 

if you just want to compress and don't need the file wrappers consider using the zlib module

import zlib
compressed = zlib.compress("text")

any reason why the shell=True and unix pipes suggestions won't work?

from subprocess import *

pipes = Popen("gzip | uuencode dummy", stdin=PIPE, stdout=PIPE, shell=True)
for i in range(1, 100):
    pipes.stdin.write("some data")
pipes.stdin.close()
print pipes.stdout.read()

seems to work

cobbal
+3  A: 

This is not the sort of thing you should be doing directly in python, there are eccentricities regarding the how thing work that make it a much better idea to do this with a shell. If you can just use subprocess.Popen("foo | bar", shell=True), then all the better.

What might be happening is that gzip has not been able to output all of its input yet, and the process will no exit until its stdout writes have been finished.

You can look at what system call a process is blocking on if you use strace. Use ps auxwf to discover which process is the gzip process, then use strace -p $pidnum to see what system call it is performing. Note that stdin is FD 0 and stdout is FD 1, you will probably see it reading or writing on those file descriptors.

Jerub
+4  A: 

I suspect the issue is with the order in which you open the pipes. UUEncode is funny is that it will whine when you launch it if there's no incoming pipe in just the right way (try launching the darn thing on it's own in a Popen call to see the explosion with just PIPE as the stdin and stdout)

Try this:

encoder = Popen(["uuencode", "dummy"], stdin=PIPE, stdout=PIPE)
compressor = Popen("gzip", stdin=PIPE, stdout=encoder.stdin)

compressor.communicate("UUencode me please")
encoded_text = encoder.communicate()[0]
print encoded_text

begin 644 dummy
F'XL(`%]^L$D``PL-3<U+SD])5<A-52C(24TL3@4`;2O+"!(`````
`
end

You are right, btw... there is no way to send a generic EOF down a pipe. After all, each program really defines its own EOF. The way to do it is to close the pipe, as you were trying to do.

EDIT: I should be clearer about uuencode. As a shell program, it's default behaviour is to expect console input. If you run it without a "live" incoming pipe, it will block waiting for console input. By opening the encoder second, before you had sent material down the compressor pipe, the encoder was blocking waiting for you to start typing. Jerub was right in that there was something blocking.

Jarret Hardie
unfortunately it still requires the entire output to be stored in memory, if you try to read() the stdout of encoder it still seems to hang
cobbal
Using communicate(), though you shouldn't be reading the stdout. The stdout contents are returned in tuple with stdin from the communicate command. You could change this to use writes and reads on both pipes, but see the comment from monkut where the python docs advise against that.
Jarret Hardie
If you have some huge data, perhaps using temp files to communicate between the shell processes would be best?
Jarret Hardie
good point, alternatively you could hack together your own implementation from the uuencode man 5 page and binascii.a2b_uu(string)
cobbal
I second the notion, though, that maybe what you really want is to use shell=True and to pipe from gzip to uuencode directly on the shell command.
Jarret Hardie
shell=True did it for me. Yay.
bsdfish