views:

195

answers:

1

Hey all -

I'm writing an MPI program (Visual Studio 2k8 + MSMPI) that uses Boost::thread to spawn two threads per MPI process, and have run into a problem I'm having trouble tracking down.

When I run the program with: mpiexec -n 2 program.exe, one of the processes suddenly terminates:

job aborted:
[ranks] message

[0] terminated

[1] process exited without calling finalize

---- error analysis -----

[1] on winblows
program.exe ended prematurely and may have crashed. exit code 0xc0000005


---- error analysis -----

I have no idea why the first process is suddenly terminating, and can't figure out how to track down the reason. This happens even if I put the rank zero process into an infinite loop at the end of all of it's operations... it just suddenly dies. My main function looks like this:

int _tmain(int argc, _TCHAR* argv[])
{
    /* Initialize the MPI execution environment. */
    MPI_Init(0, NULL);

    /* Create the worker threads. */
    boost::thread masterThread(&Master);
    boost::thread slaveThread(&Slave);

    /* Wait for the local test thread to end. */
    masterThread.join();
    slaveThread.join();

    /* Shutdown. */
    MPI_Finalize();
    return 0;
}

Where the master and slave functions do some arbitrary work before ending. I can confirm that the master thread, at the very least, is reaching the end of it's operations. The slave thread is always the one that isn't done before the execution gets aborted. Using print statements, it seems like the slave thread isn't actually hitting any errors... it's happily moving along and just get's taken out in the crash.

So, does anyone have any ideas for:
a) What could be causing this?
b) How should I go about debugging it?

Thanks so much!

Edit:

Posting minimal versions of the Master/Slave functions. Note that the goal of this program is purely for demonstration purposes... so it isn't doing anything useful. Essentially, the master threads send a dummy payload to the slave thread of the other MPI process.

void Master()
{   
    int  myRank;
    int  numProcs;
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myRank);

    /* Create a message with numbers 0 through 39 as the payload, addressed 
     * to this thread. */
    int *payload= new int[40];
    for(int n = 0; n < 40; n++) {
        payload[n] = n;
    }

    if(myRank == 0) {
        MPI_Send(payload, 40, MPI_INT, 1, MPI_ANY_TAG, MPI_COMM_WORLD);
    } else {
        MPI_Send(payload, 40, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD);
    }

    /* Free memory. */
    delete(payload);
}

void Slave()
{
    MPI_Status status;
    int *payload= new int[40];
    MPI_Recv(payload, 40, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

    /* Free memory. */
    delete(payload);
}
+1  A: 

you have to use thread safe version of mpi runtime. read up on MPI_Init_thread.

aaa
Okay, checked it out. It looks like the system I'm on doesn't support MPI_THREAD_MULTIPLE; it only goes up to MPI_THREAD_SERIALIZED.Am I screwed?
HokieTux
As a rough guess, "serialized" means you'll be fine so long as you protect your MPI calls with some sort of critical-section lock so that you can guarantee that only one thread is doing an MPI call at a time.
Brooks Moses
@HokieTuxhard to tell. in my opinion doing send/receive from same mpi rank is not a good design, but I do not know your details. Try to design your solution so that you do not send and receive simultaneously, perhaps using threat signals/shared memory
aaa