Running R jobs on a grid computing environment

I am running some large regression models in R in a grid computing environment. As far as I know, the grid just gives me more memory and faster processors, so I think this question would also apply for those who are using R on a powerful computer.

The regression models I am running have lots of observations, and several factor variables that have many (10s or 100s) of levels each. As a result, the regression can get computationally intensive. I have noticed that when I line up 3 regressions in a script and submit it to the grid, it exits (crashes) due to memory constraints. However, if I run it as 3 different scripts, it runs fine.

I'm doing some clean up, so after each model runs, I save the model object to a separate file, rm(list=ls()) to clear all memory, then run gc() before the next model is run. Still, running all three in one script seems to crash, but breaking up the job seems to be fine.

The sys admin says that breaking it up is important, but I don't see why, if I'm cleaning up after each run. 3 in one script runs them in sequence anyways. Does anyone have an idea why running three individual scripts works, but running all the models in one script would cause R to have memory issues?

thanks! EXL

ansaurus

tags:

views:

answers:

Running R jobs on a grid computing environment

related questions