I've been given a 2D matrix representing temperature points on the surface of a metal plate. The edges of the matrix (plate) are held constant at 20 degrees C and there is a constant heat source of 100 degrees C at one pre-defined point. All other grid points are initially set to 50 degrees C.
My goal is to take all interior grid points and compute its steady-state temperature by iteratively averaging over the surrounding four grid points (i+1, i-1, j+1, j-1) until I reach convergence (a change of less than 0.02 degrees C between iterations).
As far as I know, the order in which I iterate over the grid points is irrelevant.
To me, this sounds like a fine time to invoke the Fortran FORALL
construct and explore the joys of parallelization.
How can I ensure that the code is indeed being parallelized?
For example, I can compile this on my single-core PowerBook G4 and I would expect no improvement in speed due to parallelization. But if I compile on a Dual Core AMD Opteron, I would assume that the FORALL construct can be exploited.
Alternatively, is there a way to measure the effective parallelization of a program?
Update
In response to M.S.B's question, this is with gfortran version 4.4.0. Does gfortran support automatic multi-threading?
That's remarkable that the FORALL construct has been rendered obsolete by, I suppose, what is then auto-vectorization.
Perhaps this is best for a separate question, but how does auto-vectorization work? Is the compiler able to detect that only pure functions or subroutines are being used in a loop?