I use mpi4py and openmpi on a multi-cpu/core machine to do linear algebra. My numpy is built using ATLAS. Suppose I have a 4 core machine and I would like to run a 4 node python script that does linear algebra on each node using numpy.
How can I ensure that ATLAS does not use more than one core when it is doing linear algebra in each node? When I build ATLAS, there seems to be no option to have it configured for running on only one core at a time. With Intel MKL, I think you can set OMP_NUM_THREADS=1, and this behavior is guaranteed. Is there a way to build ATLAS just for this purpose? There doesn't seem to be an environment variable equivalent.
I am guessing that running several BLAS operations simultaneously on each core of a multicore CPU is not a good strategy. Can anyone comment on this or give rules of thumb where this is a good or bad idea?