views:

36

answers:

1

Hi All

I am setting up a set of computers where to run math programs on top of MPI. Do you know whether exist some library doing PCA - Principal Component Analysis using MPI so to use all the resources of the networked pcs? I will have a look at Scalapack, but do you know other libraries? My language is C++ on linux but if there is a good lib also for windows is the same

Thanks

+2  A: 

A PCA is a reasonably cheap operation so your ratio of communication (getting data to the nodes) relative to computation (the actual operation, here the PCA) is likely to be relatively poor.

This means that clustering may not be a great solution for this particular problem.

Moreover, PCA is really a linear algebra operation so you are better off looking at optimised BLAS such as ATLAS, Goto, MKL, ... which (these days) can make use of multiple cores giving you implicit parallelism which is easier to use than the explicit parallelism using MPI.

Do not get me wrong -- I really like MPI (and have some tutorials here on using it with R) but you need to keep in mind that not all tools are appropriate for all problems.

Dirk Eddelbuettel
Thanks for the response. Can I ask you one more simple question? Do you think that the same applies when I have a huge amount of data? I need to process 40000 vector each having a size of 12 GB. As I am now I need to study stuff you mentioned in your reponse .. sorry for this..
How much Ram do you have on each node? Will each of these fit in memory? If so, you can use Atlas, MKL, ... to do PCA on each, and at the same time use MPI to send the many matrices around to work the data-parallel part.
Dirk Eddelbuettel