views:

433

answers:

2

When using mpirun, is it possible to catch signals (for example, the SIGINT generated by ^C) in the code being run?

For example, I'm running a parallelized python code. I can except KeyboardInterrupt to catch those errors when running python blah.py by itself, but I can't when doing mpirun -np 1 python blah.py.

Does anyone have a suggestion? Even finding how to catch signals in a C or C++ compiled program would be a helpful start.

If I send a signal to the spawned Python processes, they can handle the signals properly; however, signals sent to the parent orterun process (i.e. from exceeding wall time on a cluster, or pressing control-C in a terminal) will kill everything immediately.

A: 

The signal module supports setting signal handlers using signal.signal:

Set the handler for signal signalnum to the function handler. handler can be a callable Python object taking two arguments (see below), or one of the special values signal.SIG_IGN or signal.SIG_DFL. The previous signal handler will be returned ...

import signal
def ignore(sig, stack):
  print "I'm ignoring signal %d" % (sig, )

signal.signal(signal.SIGINT, ignore)
while True: pass

If you send a SIGINT to a Python interpreter running this script (via kill -INT <pid>), it will print a message and simply continue to run.

Torsten Marek
Thanks, but unfortunately this doesn't really answer my question. I didn't want to know how to intercept signals in Python; I wanted to know how to prevent `mpirun` from catching them first, or how to pass them from the runner (`orted` or `ortedrun`). When executing `mpirun -np 1 python test.py`, it still will die when I hit control-C or when I send ortedrun a signal. (If I do find the pid of the Python interpreter that ortedrun spawns and signal that, this works; but this isn't really what I want.)
Seth Johnson
A: 

If you use mpirun --nw, then mpirun itself should terminate as soon as it's started the subprocesses, instead of waiting for their termination; if that's acceptable then I believe your processes would be able to catch their own signals.

Alex Martelli
For some reason, this argument isn't being recognized by mpirun on my mac (even though it shows up in `man`). `mpirun -n 2 --nw python test.py` gives the error: `Failed to find the following executable: Executable: --nw`, and I've tried different arrangements of the args and tried `-nw` instead as well. Any idea what's up? Thanks.
Seth Johnson
Per http://www.open-mpi.org/faq/?category=osx#osx-bundled-ompi it looks like osx-bundled MPI may not be the best -- this one looks like a bug for example. I'd try their advice, e.g. with http://www.open-mpi.org/software/ompi/v1.2/downloads/openmpi-1.2.4.dmg.gz first to see if it fixes the bug, else go with http://openmpi.darwinports.com/ , etc.
Alex Martelli
Seth Johnson
But now the problem as you stated it ("will die when I hit control-C or when I send ortedrun a signal") disappears: there's no ortedrun you might accidentally signal nor would ^C hurt any of your processes any more, right? So if your problem is something else you should maybe edit and clarify the question. No more than 1 process can be the foreground one in any terminal so that can't be what you're asking, can it?
Alex Martelli