views:

70

answers:

2

I've just parallelized a fortran routine that simulates individuals behavior and I've had some problems when generating random numbers with Vector Statistical Library (a library from the Math Kernel Library). The structure of the program is the following:

program example
...
!$omp parallel do num_threads(proc) default(none) private(...) shared(...)
do i=1,n
call firstroutine(...)
enddo
!$omp end parallel do
...
end program example

subroutine firstroutine
...
call secondroutine(...)
...
end subroutine

subroutine secondroutine
...
VSL calls
...
end subroutine

I use the Intel Fortran Compiler for the compilation with a makefile that looks as follows:

f90comp = ifort
libdir = /home
mklpath = /opt/intel/mkl/10.0.5.025/lib/32/
mklinclude = /opt/intel/mkl/10.0.5.025/include/
exec: Example.o Firstroutine.o Secondroutine.o
      $(f90comp) -O3 -fpscomp logicals -openmp -o  aaa -L$(mklpath) -I$(mklinclude) Example.o -lmkl_ia32 -lguide -lpthread
Example.o: $(libdir)Example.f90
       $(f90comp) -O3 -fpscomp logicals -openmp -c $(libdir)Example.f90
Firstroutine.o: $(libdir)Firstroutine.f90
       $(f90comp) -O3 -fpscomp logicals -openmp -c $(libdir)Firstroutine.f90
Secondroutine.o: $(libdir)Secondroutine.f90
       $(f90comp) -O3 -fpscomp logicals -openmp -c -L$(mklpath) -I$(mklinclude) $(libdir)Secondroutine.f90  -lmkl_ia32 -lguide -lpthread

At compilation time everything works fine. When I run my program generating variables with it, everything seems to work fine. However, from time to time (say once each 200-500 iterations), it generates crazy numbers for a couple of iterations and then runs again in a normal way. I have not found any patern to when does this corruption happen.

Any idea on why is it happening?

A: 

The random number code is either using a global variable internally or all threads use the same generator. Eventually, two threads will try to update the same piece of memory at the same time and the result will be non-predictable.

So you must allocate one random number generator per thread.

Solution: Protect the call to the random routine with a semaphore/lock.

Aaron Digulla
The idea of using Math Kernel Library is that it is thread-safe: each thread generates its own stream of random numbers without accessing to the same global variable
Bellman
See my edit; the random number generator has some kind of memory where it remembers the parameters for the next number. This counts as "global storage" if you don't allocate one generator per thread.
Aaron Digulla
A: 

I got the solution! I was modifying the pseudo-random numbers generated by some values taken from a file. From time to time, more than one thread tried to read the same file and generated the corruption. To solve this, I added a omp critical section and it worked.

Bellman