Tuesday, January 05, 2010

debug using gdb in open mpi

6. Can I use serial debuggers (such as gdb) to debug MPI applications?

Yes; the Open MPI developers do this all the time.

There are two common ways to use serial debuggers:

Attach to individual MPI processes after they are running.

For example, launch your MPI application as normal with mpirun. Then login to the node(s) where your application is running and use the --pid option to gdb to attach to your application.

An inelegant-but-functional technique commonly used with this method is to insert the following code in your application where you want to attach:

{
int i = 0;
char hostname[256];
gethostname(hostname, sizeof(hostname));
printf("PID %d on %s ready for attach\n", getpid(), hostname);
fflush(stdout);
while (0 == i)
sleep(5);
}

This code will output a line to stdout outputting the name of the host where the process is running and the PID to attach to. It will then spin on the sleep() function forever waiting for you to attach with a debugger. Using sleep() as the inside of the loop means that the processor won't be pegged at 100% while waiting for you to attach.

Once you attach with a debugger, go up the function stack until you are in this block of code (you'll likely attach during the sleep()) then set the variable i to a nonzero value. With GDB, the syntax is:

(gdb) set var i = 7

Then set a breakpoint after your block of code and continue execution until the breakpoint is hit. Now you have control of your live MPI application and use the full functionality of the debugger.

You can even add conditionals to only allow this "pause" in the application for specific MPI processes (e.g., MPI_COMM_WORLD rank 0, or whatever process is misbehaving).

No comments: