views:

139

answers:

2

I have some legacy scientific code running on a Rocks cluster, with SGE. I have an application-specific job submission script that generates qsub scripts (i.e. the script which Sun Grid Engine takes and runs).

Within the qsub script, my legacy app is called. This app sends it's output to STDOUT. SGE intercepts STDOUT and spools it into a file in the users home directory, so the user can see results build up in real-time. I want this behavior to be maintained, but at the same time, I want to transparently log all output in the background. I figured tee would be perfect to achieve this.

So I modified the job submission script to run the app and pipe STDOUT to tee, which saves STDOUT to a file that is copied to a central store once the job completes. The app is run and piped to tee as follows:

\$GMSCOMMAND | tee \$SCRATCHDIR/gamess_output.log

The problem is, ever since I've started piping the code to tee, the app has been dying with SIGTERMs, especially when I request several nodes. I tried using the -i (ignore interrupts) parameter with tee: it makes no difference.

Things work fine if I redirect the app output to a file then cat the file once the app is done, but then I can't allow users to view results buildup in real-time (which is an important requirement).

Any ideas about why this use of tee might be failing? Or alternatively, any ideas about how else I might achieve the desired functionality?

+1  A: 

I don't know anything about why your particular case is failing, but one option might be to make $GMSCOMMAND do it's own logging. (Effectively put the tee inside the app). I guess this option depends on cost of changing the legacy app.

Failing that you could wrap the 'legacy app' with your own script/application to do the redirection/duplication.

Douglas Leeder
Thanks Douglas, it would have been a good idea if I were allowed to modify the legacy app, but as a matter of policy I can't do this.
tramdas
Ok, could you try replacing the `tee` command with `cat` - to see if it's the pipe that's the problem?
Douglas Leeder
Indeed, `| cat` fails the same way `| tee` does, so it does seem like the pipe is the problem.With this in mind I'm starting to wonder about somehow executing `$GMSCOMMAND > $outputfile` detached, and then streaming $outputfile to STDOUT until EOF.But before attempting that I really need to properly look through the Sun Grid Engine documentation to see if there's some relevant setting information.Thanks for your help thus far, if you have any other ideas, I'm all ears...
tramdas
As pipe is the problem - I suggest writing your own wrapper.
Douglas Leeder
Does it fail when piping to tee if you run it interactively, not from SGE?@Douglas Leeder: SGE runs your job script with the shell, so it's highly unlikely that having that script run another script would make any difference. Hmm, unless it's using something other than /bin/sh. I also had the idea of finding out what file stdout is going to, and copying it after $GMSCOMMAND exits. SGE sets quite a few variables in the job's environment...
Peter Cordes
First of all, sorry for the amazingly late reply... for the last few months I've been occupied with other more pressing work.I've discovered that it's an issue with my specific legacy app; using other apps, the pipe doesn't cause a SIGPIPE/SIGTERM or whatever... and on that note, I didn't get to discover exactly what signal it was because I could not run strace - it's not installed on the slave nodes.Anyway, I wrote a wrapper that utilizes forkpty in an obvious way. It's pretty standard usage of forkpty, so I won't post the code here unless someone asks.Thanks all.
tramdas
A: 

If pipes are your problem perhaps you can get around this by using a 'while/read' loop with process substitution. Does this work for you?

while read line; do
    echo "$line"
    echo "$line" >> ${SCRATCHDIR}/gamess_output.log
done <(${GMSCOMMAND})
SiegeX