ansaurus

Question

How to manage subprocess in Python when the output is huge?

Answer 1

+1 A:

Why not write silently to a file and then tail it?

You can use file.flush() to clear Python's file buffer.

Python will happily handle new lines in a currently-open file. For instance:

f = open( "spam.txt", "r" )
f.read()
# 'I like ham!'
# Now open up spam.txt in some other program and add a new line.
f.read()
# 'I like eggs too!'

katrielalex 2010-08-03 14:11:59

Tail woudn't be aware of any new lines. Logic for that would need to be added and that should be avoided.It wouldn't solve the problem either: it seemed python was taking 2GB of ram when only outputting to a file (not even printing to screen).

big_gie 2010-08-03 14:23:48

The link is to a Python implementation of `tail`; it would work. See new code.

katrielalex 2010-08-03 14:38:51

And flushing the file buffer regularly should stop Python using huge amounts of memory.

katrielalex 2010-08-03 14:39:41

Answer 2

+1 A:

The simplest solution was to change the code so it outputs to stdout AND a log file. Then, the output does not neet to be saved using tee or a pipe.

pipe_verbose = sys.stdout
pipe_silent  = open('/dev/null', 'w')

subprocess.Popen(shlex.split(command), stdout=pipe_silent)
subprocess.Popen(shlex.split(command), stdout=pipe_verbose)

and finally I poll() to see when done.

Piping has the nice result of that if I ctrl+c the script, it kills the job too. If I did not put stdout=... in the Popen(), then the job continues in the background. Also, python's CPU usage stays at 0% that way. A readline loop on a pipe would raise it to 100%...

big_gie 2010-08-03 21:54:38

@big_gie: Are you running the same command twice?

MattH 2010-08-04 09:03:08

He most definitely running the same command twice, in parallel.

ddotsenko 2010-08-04 10:57:29

Actually, all this in enclosed in a function. I just pasted the relevant lines about how the subprocess are being launched. I wanted to emphasis the stdout= option and its pipe.

big_gie 2010-08-05 04:18:04

Answer 3

A:

If output has reliably-occurring output delimiters (markers indicating end of output section) consider doing the "bad" thing and reading the stdout chunks from subprocess in a separate thread and writing individual chunks to log, flashing them with every write.

Take a look here for some examples of non-blocking reads from subprocess' pipe:

http://stackoverflow.com/questions/3076542/how-can-i-read-all-availably-data-from-subprocess-popen-stdout-non-blocking/3078292#3078292

ddotsenko 2010-08-04 11:04:12

ansaurus

tags:

views:

answers:

How to manage subprocess in Python when the output is huge?

related questions