I use FFTW 3.1.2 with Fortran to perform real to complex and complex to real FFTs. It works perfectly on one thread.
Unfortunately I have some problems when I use the multi-threaded FFTW
on a 32 CPU shared memory computer. I have two plans,
one for 9 real to complex FFT and one for 9 complex to real FFT (size
of each real field: 512*512). I use Fortran and I compile (using ifort
) my
code linking to the following libraries:
-lfftw3f_threads -lfftw3f -lm -lguide -lpthread -mp
The program seems to compile correctly and the function sfftw_init_threads
returns a non-zero integer value, usually 65527.
However, even though the program runs perfectly, it is slower with 2
or more threads than with one. A top
command shows weird CPU load
larger than 100% (and much more larger than n_threads*100). An htop
command shows that one processor (let's say number 1) is working at a
100% load on the program, while ALL the other processors, including
number 1, are working on this very same program, at a 0% load, 0% memory and 0 TIME.
If anybody has any idea of what's going on here... thanks a lot!