MPI is running correctly by what you said -- instead your assumptions are incorrect. In every MPI implementation (that I have used anyway), the entire program is run from beginning to end on every process. The MPI_Init and MPI_Finalize functions are required to setup and tear-down MPI structures for each process, but they do not specify the beginning and end of parallel execution. The beginning of the parallel section is first instruction in main, and the end is the final return.
A good "template" program for what it seems like you want would be (also answered in http://stackoverflow.com/questions/2156714/how-to-speed-up-this-problem-by-mpi):
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { // Do the serial part on a single MPI thread
printf("Performing serial computation on cpu %d\n", myid);
PreParallelWork();
}
ParallelWork(); // Every MPI thread will run the parallel work
if (myid == 0) { // Do the final serial part on a single MPI thread
printf("Performing the final serial computation on cpu %d\n", myid);
PostParallelWork();
}
MPI_Finalize();
return 0;
}