tags:

views:

58

answers:

1

The MPI standard (page 295) says:

Advice to users. Whether the errorcode is returned from the executable or from the MPI process startup mechanism (e.g., mpiexec), is an aspect of quality of the MPI library but not mandatory.

Indeed I had no success in running the following code:

if(0 == my_rank)
{
   FILE* parameters = fopen("parameters.txt", "r");
   if(NULL == parameters)
   {
     fprintf(stderr, "Could not open parameters.txt file.\n");
     printf("Could not open parameters.txt file.\n");
     exit(EXIT_FAILURE); //Tried MPI_Abort() as well
   }
   fscanf(parameters, "%i %f %f %f", N, X_DIMENSION_Dp, Y_DIMENSION_Dp, HEIGHT_DIMENSION_Dp);
   fclose(parameters);
}

I am not able to get the error code back into the shell in order to make a decision on further actions. Neither of two error messages are printed. I think I might write the error codes and messages to a dedicated file.

Has anyone ever had a similar problem and what were the options you've considered to do a reliable error reporting?

EDIT:
The problem was not caused by the MPI. What really was wrong is the way I treated error codes that the scheduler returned. I use system with LoadLeveler installed. First I do

$ llsubmit my_job_file.sh

then upon completion of the job I recive the email with the status of the job and it's return error code. In my case the error code was always zero even if my MPI programm has exited using MPI_Abort function. Then I realized that the error code returned was that of the script my_job_file.sh itself, but not the MPI program that is run within the script. my_job_file.sh looked like that:

# @ different LoadLeveler options ...
poe ./my_mpi_program > my_mpi_program.output

Then I've modified it to be

# @ different LoadLeveler options ...
poe ./my_mpi_program > my_mpi_program.output
exit $?

and then I finaly got the error code I wanted.

+1  A: 

MPI_Abort should work.

 int MPI_Abort( MPI_Comm comm, int errorcode )
Taylor Leese
In general MPI_Abort is a reasonable suggestion. However, it should not ever be called from within a signal handler. The resulting behavior is undefined, and will hang in most cases.
semiuseless
Interesting point, but is that the OP's situation?
Taylor Leese
The OP was asking about returning an error code (as opposed to the more general 'return code'), so I felt that the caution was worthwhile to mention.
semiuseless