views:

68

answers:

3

I'm controlling long running simulations (hours, days, even weeks) using a bash script that iterates over all wanted parameters. If only one simulation runs concurrently, the output is piped to "tee", else the output is plainly piped ">" to an output file. All output are huge: some log files are ~2GB and could be even bigger.

The script is working, but is a hell to maintain. When we add a new parameter it takes some time to adapt the script and all the sed-foo in it. So I've ported it to Python. It's working GREAT.

The only problem I have now preventing me from using it in production is that I can't find the right way of calling Popen() to launch the program. If I run it "silent" by piping everything to the file and not showing any output, python takes gigabytes of ram before the simulation is done.

Here's the code snipet:

fh = open(logfile, "w")
pid = subprocess.Popen(shlex.split(command), stdout=fh)
pids.append(pid)

I've read a lot of stuff about Popen the output, but I though that piping it to a file would flush the buffer when needed?

Maybe subprocess' Popen() is not the best for this? What's the best way to show and save a program's output to screen and file without taking all the ram?

Thanx!

+1  A: 

Why not write silently to a file and then tail it?

You can use file.flush() to clear Python's file buffer.


Python will happily handle new lines in a currently-open file. For instance:

f = open( "spam.txt", "r" )
f.read()
# 'I like ham!'
# Now open up spam.txt in some other program and add a new line.
f.read()
# 'I like eggs too!'
katrielalex
Tail woudn't be aware of any new lines. Logic for that would need to be added and that should be avoided.It wouldn't solve the problem either: it seemed python was taking 2GB of ram when only outputting to a file (not even printing to screen).
big_gie
The link is to a Python implementation of `tail`; it would work. See new code.
katrielalex
And flushing the file buffer regularly should stop Python using huge amounts of memory.
katrielalex
+1  A: 

The simplest solution was to change the code so it outputs to stdout AND a log file. Then, the output does not neet to be saved using tee or a pipe.

pipe_verbose = sys.stdout
pipe_silent  = open('/dev/null', 'w')

subprocess.Popen(shlex.split(command), stdout=pipe_silent)
subprocess.Popen(shlex.split(command), stdout=pipe_verbose)

and finally I poll() to see when done.

Piping has the nice result of that if I ctrl+c the script, it kills the job too. If I did not put stdout=... in the Popen(), then the job continues in the background. Also, python's CPU usage stays at 0% that way. A readline loop on a pipe would raise it to 100%...

big_gie
@big_gie: Are you running the same command twice?
MattH
He most definitely running the same command twice, in parallel.
ddotsenko
Actually, all this in enclosed in a function. I just pasted the relevant lines about how the subprocess are being launched. I wanted to emphasis the stdout= option and its pipe.
big_gie
A: 

If output has reliably-occurring output delimiters (markers indicating end of output section) consider doing the "bad" thing and reading the stdout chunks from subprocess in a separate thread and writing individual chunks to log, flashing them with every write.

Take a look here for some examples of non-blocking reads from subprocess' pipe:

http://stackoverflow.com/questions/3076542/how-can-i-read-all-availably-data-from-subprocess-popen-stdout-non-blocking/3078292#3078292

ddotsenko