tags:

views:

3465

answers:

8

I have an MPI program which compiles and runs, but I would like to step through it to make sure nothing bizarre is happening. Ideally, I would like a simple way to attach GDB to any particular process, but I'm not really sure whether that's possible or how to do it. An alternative would be having each process write debug output to a separate log file, but this doesn't really give the same freedom as a debugger.

Are there better approaches? How do you debug MPI programs?

+1  A: 

The "standard" way to debug MPI programs is by using a debugger which supports that execution model.

On UNIX, TotalView is said to have good suppoort for MPI.

+8  A: 

As someone else said, TotalView is the standard for this. But it will cost you an arm and a leg.

The OpenMPI site has a great FAQ on MPI debugging. Item #6 in the FAQ describes how to attach GDB to MPI processes. Read the whole thing, there are some great tips.

If you find that you have far too many processes to keep track of, though, check out Stack Trace Analysis Tool (STAT). We use this at Livermore to collect stack traces from potentially hundreds of thousands of running processes and to represent them intelligently to users. It's not a full-featured debugger (a full-featured debugger would never scale to 208k cores), but it will tell you which groups of processes are doing the same thing. You can then step through a representative from each group in a standard debugger.

tgamblin
+3  A: 

http://github.com/jimktrains/pgdb/tree/master is a utility I wrote to do this very thing. There are some docs and feel free to pm me for questions.

You basically call a perl program that wraps GDB and funnels it's IO to a central server. This allows GDB to be running on each host and for you to access it on each host at the terminal.

jimktrains
Thanks! I will definitely check this out next time I'm working in MPI.
Jay Conrod
A: 

I use this little homebrewn method to attach debugger to MPI processes - call the following function, DebugWait(), right after MPI_Init() in your code. Now while the processes are waiting for keyboard input, you have all the time to attach the debugger to them and add breakpoints. When you are done, provide a single character input and you are ready to go.

static void DebugWait(int rank) {
    char a;

    if(rank == 0) {
     scanf("%c", &a);
     printf("%d: Starting now\n", rank);
    } 

    MPI_Bcast(&a, 1, MPI_BYTE, 0, MPI_COMM_WORLD);
    printf("%d: Starting now\n", rank);
}

Of course you would want to compile this function for debug builds only.

MPI has required the most debug statements I have ever written for even simple code. (lol) This can be very helpful.
Troggy
A: 

I do some MPI-related debugging with log traces, but you can also run gdb if you're using mpich2: MPICH2 and gdb

Jim Hunziker
A: 

There is also my open-source tool, padb, which aims to help with parallel programming. I call it a "Job Inspection Tool" as it functions not only as a debugger can also function for example as a parallel top like program. Run in "Full Report" mode it'll show you stack traces of every process within your application along with local variables for every function over every rank (assuming you compiled with -g). It'll also show you the "MPI message queues", that is the list of outstanding sends and receives for each rank within the job.

As well as showing the full report it's also possible to tell padb to zoom in on individual bits of information within the job, there are a myriad of options and configuration items to control what information is shown, see the web page for more details.

Padb

Ashley Pittman
A: 

I have found gdb quite useful. I use it as

mpirun -np <NP> xterm -e ./program 

This the launches xterm windows in which I can do

run <arg1> <arg2> ... <argN>

usually works fine

messenjah
A: 

http://valgrind.org/ nuf said


More specific link: Debugging MPI Parallel Programs with Valgrind

Chad Brewbaker
Valgrind is not the same as an interactive debugger, but it is nice to know it works with MPI.
Jay Conrod