views:

115

answers:

3

This question came at the right time, as I'm struggling with optimization as well. I am aware of the different "normal" optimization routines in R, and I am aware of parallel packages like snow, snowfall, Rmpi and the likes. Yet, I didn't manage to get an optimization running in parallel on my computer.

Some toy code to illustrate :

f <- function(x) sum((x-1:length(x))^2)
a <- 1:5
optim(a,f)
nlm(f,a)

What I want to do, is to parallelize the optim() function ( or the nlm() function, which does basically the same). My real function f() is a lot more complicated, and one optimization round lasts about half an hour. If I want to run a simulation of 100 samples, that one takes ages. I'd like to avoid writing my own Newton-like algorithm for parallel computing, so I hope somebody could give me some hints on how to use parallel computing for complex optimization problems in R.


I reckon this problem is of a different nature than the one in the related question. My request is specifically directed towards parallel computing, not some faster alternative for optim.

A: 

If I were after speed, the first thing I would do is go from R to C or C++. (Also Fortran, I hate to say.) When I had squeezed out every possible cycle using this technique, I would introduce MPI to take advantage of parallel hardware.

Mike Dunlavey
I have a cluster with 20+ cores available, I want to use them. I have about 5 PhD students that spit out new exotic models at a burning rate, and all they know is R. So C / Fortran is not an option (unless I write the engine myself and call that code from within R, which is basically how R functions). Bottomline : I really need **parallel** optimization thingies in **R**.
Joris Meys
the switching to C before going parallel in R is naive and assumes that C developers are cheaper than going parallel in R. This might be the case for @mike but clearly not the case for @joris. Any development choice is an optimization subject to a budget constraint. It's very important to be realistic about the budget constraint.
JD Long
@Joris: @JD: Just trying to help.
Mike Dunlavey
@Mike : for the record, I found the provided link very interesting, and I appreciate the effort you took to give me an answer. It just wasn't an answer to my question ;-).
Joris Meys
@mike, that's totally fair. I was probably a bit terse because it's very common when an R guy asks a performance question the first response from the CS crowd is, "well you have to get out of R and get into a 'real language' like C" Which isn't particularly helpful ;) Thank you for giving some input.
JD Long
@JD: Yeah, it's always a risk on SO, that you don't know how rigidly an OP means his/her question. The thing about interpreted languages like R is they bring people in with ease of use, and later people want performance, but you can't *really* have both very easily. Matlab has some sort of compiler, FWIW.
Mike Dunlavey
@Mike Dunlavey : You shouldn't underestimate R, but indeed you have to know how to use it to make it perform. The main reason for using R in my case is the huge amount of statistical techniques that are readily available. And I'm not talking about calculating means ;-)
Joris Meys
@Joris: I certainly don't wish to underestimate R. I work with people who use it heavily. The value is in what it makes easy for you, the things you can get done well in minimal code, not in it's performance. In fact our experts use it as a scripting language to fire up other tools and manipulate the results, tools like NONMEM, WinBugs, and our own language for pharmacometric mixed-effect modeling.
Mike Dunlavey
Mike Dunlavey
@Mike: have you taken a look at the inline package?
Joshua Ulrich
@Joshua: I'm afraid I don't know what that is.
Mike Dunlavey
@Mike: [inline](http://cran.r-project.org/web/packages/inline/) contains "Functionality to dynamically define R functions and S4 methods with in-lined C, C++ or Fortran code supporting .C and .Call calling conventions." Dirk Eddelbuettel has [an example](http://dirk.eddelbuettel.com/blog/2009/12/20/#rcpp_inline_example) on his blog.
Joshua Ulrich
A: 

Sprint might be of interest. I know nothing about it but stumbled across it recently.

High Performance Mark
Thx for the pointer, but I knew it already. There are more frameworks for parallel computing in R, depending on the protocols you want to use. Yet, I couldn't find a -non-beta- optimization function that uses the power of parallel computing.
Joris Meys
A: 

To answer my own question :

There is a package in development that looks promising. It has Particle Swarm Optimization methods and builds on the Rmpi package for parallel computing. It can be found on Rforge :

http://www.rforge.net/ppso/index.html

It's still in beta AFAIK, but it looks promising. I'm going to take a look at it later on, I'll report back when I know more. Still, I leave the question open, so if anybody else has another option...

Joris Meys
If you're considering PSO, have you thought about differential evolution (via the DEoptim package)? Parallel computing support is on the package's to-do list and shouldn't take more than a few hours of work (for me, not you :-).
Joshua Ulrich
@Joshua Thx for the tip, I didn't know DEoptim yet. It looks promising, but for the problem I'm working on now it's actually quite slower than nlm(). I've 13 parameters and no clear lower and upper limits, so I have to set them rather big to avoid missing a parameter...
Joris Meys
Tried the beta out and it seems to work. On my problem, it still doesn't provide the same improvement as parallelizing the function itself. Yet, I can see that in other cases this would really be a useful tool. I'm looking forward to the first stable release.
Joris Meys