ansaurus

Question

How can I ensure that my Fortran FORALL construct is being parallelized?

Answer 1

+1 A:

The best way is to measure the clock time of the calculation. Try it with and without parallel code. If the clock time decreases, then your parallel code is working. The Fortran intrinsic system_clock, called before and after the code block, will give you the clock time. The intrinsic cpu_time will give you the cpu time, which might go up when code in run multi-threaded due to overhead.

The lore is the FORALL is not as useful as was thought when introduced into the language -- that it is more of a initialization construct. Compilers are equally adept at optimizing regular loops.

Fortran compilers vary in their abilities to implement true parallel processing without it being explicitly specified, e.g., with OpenMP or MPI. What compiler are you using?

To get automatic multi-threading, I've used ifort. Manually, I've used OpenMP. With both of these, you can compile your program with and without the parallelization and measure the difference.

M. S. B. 2010-09-05 23:10:02

Answer 2

+1 A:

If you use Intel Fortran Compiler, you can use a command line switch to turn on/increase the compliler's verbosity level for parallelization/vectorization. This way during compilation/linking you will be shown something like:

FORALL loop at line X in file Y has been vectorized

I admit that it has been a few of years since the last time I used it, so the compiler message might actually look very different, but that's the basic idea.

exfizik 2010-09-20 04:22:11

I have to get my hands on ifort to see what the exact message is, but this kind of verbosity is exactly what I was looking for! Even for cases of auto-vectorization, I'd like to know which looks are being parallelized and which aren't, particularly for cases where I would assume parallelization should have been possible.

CmdrGuard 2010-09-26 04:55:21

Answer 3

+1 A:

FORALL is an assignment construct, not a looping construct. The semantics of FORALL state that the expression on the right hand side (RHS) of each assignment within the FORALL is evaluated completely before it is assigned to the left hand side (LHS). This has to be done no matter how complex the operations on the RHS, including cases where the RHS and the LHS overlap.

Most compilers punt on optimizing FORALL, both because it is difficult to optimize and because it is not commonly used. The easiest implementation is to simply allocate a temporary for the RHS, evaluate the expression and store it in the temporary, then copy the result into the LHS. Allocation and deallocation of this temporary is likely to make your code run quite slowly. It is very difficult for a compiler to automatically determine when the RHS can be evaluated without a temporary; most compilers don't make any attempt to do so. Nested DO loops turn out to be much easier to analyze and optimize.

With some compilers, you may be able to parallelize evaluation of the RHS by enclosing the FORALL with the OpenMP "workshare" directive and compiling with whatever flags are necessary to enable OpenMP, like so:

!$omp parallel workshare FORALL (i=,j=,...) END FORALL !$omp end parallel

gfortran -fopenmp blah.f90 -o blah

Note that a compliant OpenMP implementation (including at least older versions of gfortran) is not required to evaluate the RHS in parallel; it is acceptable for an implementation to evaluate the evaluation as though it is enclosed in an OpenMP "single" directive. Note also that the "workshare" likely will not eliminate the temporary allocated the RHS. This was the case with an old version of the IBM Fortran compiler on Mac OS X, for instance.

Brian 2010-09-21 19:40:12

Hmmm. I had never considered the complexity of the RHS as affecting the possibility for parallelization. Your point, then, is very clear regarding why compilers might punt on optimizing a FORALL loop.

CmdrGuard 2010-09-26 04:59:15

ansaurus

tags:

views:

answers:

How can I ensure that my Fortran FORALL construct is being parallelized?

related questions