Test MPI_Barrier C++

I think that to be sure that the MPI_Barrier is working correctly you have to write a program which is guaranteed to behave differently for working and non-working barriers.

I don't think that @Neeraj's answer is guaranteed to behave that way. If the barrier is working correctly the processes will all write their first output lines before any writes a second output line. However it is possible that this will happen even in the absence of the barrier (or where the barrier has failed completely if you want to think of it this way). My assertion does not depend on the very short sleep times he suggests (5ms*rank). Even if you suppose that the processes wait (5s*rank) it is possible that the statements would appear in the barrier-imposed order in the absence of the barrier. Unlikely I grant you, but not impossible, especially when you have to consider how the o/s buffers multiple writes to stdout -- you might actually be testing that process not the barrier. Oh you cry even the most inaccurate computer clock will result in process 1 waiting enough less time than process 2 to show the correct working of the barrier. Not if the o/s preemptively grabs processor 1 (on which process 1 is trying to run) for 10s it doesn't.

Dependence on the on-board clocks for synchronisation actually makes the program less deterministic. All the processors have their own clocks, and the hardware doesn't make any guarantees that they all tick at exactly the same rate or with exactly the same tick length.

Nor does that test adequately explore all the failure modes of the barrier. At best it only explores the complete failure; what if the implementation is actually a leaky barrier, so that occasionally a process gets through before the last process has reached the barrier ? Off-by-one errors are incredibly common in programs. Or perhaps the barrier code was written 3 years ago and only has enough memory to record the arrival of, say, 2^12==4096 processes and you've put it on a brand new machine with 2^18 processors; the barrier is more of a weir than a dam.

I haven't thought about this deeply until now, I've never suspected that any of the MPI implementations I've used had faulty barriers, so I don't have a good suggestion about how to thoroughly test a barrier. I'd be inclined to use a parallel debugger and examine the execution of the program through the barrier, but that's not going to provide a guarantee of correct behaviour.

It's an interesting question though.

Regards

Mark

janneb 2010-01-14 08:08:47

For a question tagged C++. :(

MSalters 2010-01-14 08:47:22

i think, it would now work for a question tagged C++

Neeraj 2010-01-14 10:57:05

Not entirely sure - couldn't `std::cout` have its own barriers (MPI or otherwise)?

MSalters 2010-01-14 16:27:35

i partially agree with you Mark, the sleep solution doesn't guarantee detection of a faulty barrier. Although the probability of detection can increase if the sleep time is increased. Also you should note that the arguments to sleep() are actually in seconds.

Neeraj 2010-01-14 17:37:51

Hmmm, I'm not a C++ programmer so I Googled for a definition of the sleep() function, the first useful hit I got was this http://msdn.microsoft.com/en-us/library/ms686298(VS.85).aspx. They look like milliseconds to me. I guess sleep() is not part of the C++ language or library standards. But that's not really the point.

High Performance Mark 2010-01-14 17:55:37

A sleep() call which takes the argument in seconds is in POSIX. See e.g. http://www.opengroup.org/onlinepubs/000095399/functions/sleep.htmlOf course, one should include unistd.h in order to use it.

janneb 2010-01-14 18:08:20

Thanks for your time. But I should study more about this topic. So let me know if you have any opinions. Kind Regards

aryan 2010-01-16 09:38:52

@aryan. My opinion is that you should not worry that MPI_Barrier is not working until you have overwhelming evidence to back up your worries. Don't forget that MPI puts in barriers for some other operations too -- don't worry about them either.

High Performance Mark 2010-01-16 12:48:19

ansaurus

tags:

views:

answers:

Test MPI_Barrier C++

related questions