ansaurus

Question

Am I allowed to throw an exception inside MPI-parallelized code?

Answer 1

A:

Whether or not exceptions will work during parallel execution depends on your compiler and MPI library implementation. If you want portable behavior, I'd avoid throwing exceptions in that context.

If you want more detailed information about errors than just a numeric return code, you can of course return and/or pass around error strings or other objects (within the same process or through MPI, of course).

aschepler 2010-09-27 16:42:36

Answer 2

+1 A:

In an ideal world, you can use them to do what you ask. By "ideal world" I mean one where you have your choice of MPI implementation and are able to administer it yourself (instead of convincing the cluster owner to reconfigure it for you). The minimal configuration for exceptions will include the: --with-exceptions flag, and possibly a few more.

I've used LAM most often, and by default exceptions are disabled. I believe this is the default for other implementations as well.

They work in the same vein as 'vanilla' C++ exceptions. And they do work inside parallel executed code.

At some point in your startup code, you want to enable them:

MPI::COMM_WORLD.Set_errhandler ( MPI::ERRORS_THROW_EXCEPTIONS );

(if your library isn't configured to allow exceptions, this is probably a bad idea -- behaviour "undefined" according to LAM)

And then:

try { /* something that can fail */ } 
catch ( MPI::Exception e ) {

    cout << "Oops: " << e.Get_error_string() << e.Get_error_code();
    MPI::COMM_WORLD.Abort (-1) ;
}

As for it being good or bad practice, I can't really say. I haven't seen extensive use of them in code written by hardened MPI hackers, but that may be because the code is generally more C than C++ in my experience.

A middle ground between error codes and exceptions may be error handlers, in a nutshell you can assign functions that will be called when a particular error (designated by code) occurs. This might be an option if you can't get your administrator on board with enabling exceptions.

John Carter 2010-09-27 18:21:51

Answer 3

A:

Exceptions work the same in an MPI code as with a serial code, but you have to be extremely careful if it is possible for the exception is not raised on all processes in a communicator or you can easily end up with deadlock.

MPI_Barrier(comm);            /* Or any synchronous call */
if (!rank) throw Exception("early exit on rank=0");
MPI_Barrier(comm);            /* rank>0 deadlocks here because rank=0 exited early */

All error handling methods have this problem, it is difficult to recover from errors that do not occur consistently across a communicator. In the case above, you could perform an MPI_Allreduce so that all ranks choose the same branch.

My preference is for calling error handlers and propagating them up the stack since this tends to give me tho most useful/verbose error message and it's easy to catch with a breakpoint (or the error handler can attach a debugger to itself and send it to your workstation in an xterm).

Jed 2010-09-27 20:23:09

ansaurus

tags:

views:

answers:

Am I allowed to throw an exception inside MPI-parallelized code?

related questions