views:

50

answers:

3

Hello All,

How can i make MPI process notify the others about an error for example, specially on an MPI program where all the MPI processees are independant from each others ( There no synchronisation between the different MPI processees ) ?

Thanks

+2  A: 

I find your idea of an MPI program in which all the processes are independent very strange. I think that, by definition, all the processes in an MPI program are not independent, they are all, for example, in the same communicator after you have called MPI_INIT so they all 'know' of each others existence. You may have written your code so that the processes do not synchronise after that, but the means still exist for processes to communicate with each other.

One mechanism to look into (which does require synchronisation) is MPI_BCAST (broadcast). Another approach would be to use MPI_ISEND, the non-blocking send operation but, sooner or later, one process or another will have to receive and your sending process ought to test whether the send has succeeded or not.

High Performance Mark
Well, the more they are independant, the more you approach the perfect paralization by reducing communication and waiting time
A: 

There is nothing in the MPI Standard that allows for an "interrupt" to be sent from one rank to another rank (or ranks). In general, progression requires that user code enter the MPI library from time to time. Absent progression, there is no standard way to communicate between the ranks.

Synchronization requires that from time to time there is some entry into the MPI library. MPI_Barrier is the "big hammer" approach to synchronization. Combined with MPI_Reduce_Scatter, it would be possible to know there is some error on at least one rank.

semiuseless
A: 

Being independent and having no synchronisation are two entirely different scenarios when dealing with MPI, thanks to non-blocking communication.

It seems to me that what you want can be implemented this way: when an error occurs, a process broadcasts a message with a designated "error" tag, and each process periodically posts non-blocking receives for a message with this tag. If they receive such a message, it means that an error occured recently and they can react accordingly, otherwise they continue their normal execution.

(Note that "broadcasting" in this case doesn't refer to MPI_Bcast, since that's a collective communication operation, and as such blocks. Instead, it simply means sending the same message to everyone it may concern. If you want to maintain no synchronisation between the processes, then this sending will have to be non-blocking as well.)

suszterpatt
@suszterpatt: I agree, but I fear that you may confuse OP by suggesting that 'broadcast' can be matched by message receives. I know what you mean, but MPI uses the term 'broadcast' for a collective communication routine.
High Performance Mark
Good call, edited the answer to clarify.
suszterpatt