These answers work, but they all require a call to a non-deterministic function like sample()
in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for
loop. The trick is to use the n
argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n
calls, which matters if your n
is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)
steps <- 30
n <- 100
directions <- c(-1, 1)
results <- vector('list', n)
for (i in seq_len(n)) {
walk <- numeric(steps)
for (s in seq_len(steps)) {
walk[s] <- sample(directions, 1)
}
results[[i]] <- sum(walk)
}
We can rewrite this with one call to sample()
:
all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)
Proof of speed increase (n=10000):
> system.time({
+ for (i in seq_len(n)) {
+ walk <- numeric(steps)
+ for (s in seq_len(steps)) {
+ walk[s] <- sample(directions, 1)
+ }
+ results[[i]] <- sum(walk)
+ }})
user system elapsed
4.231 0.332 4.758
> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
user system elapsed
0.010 0.001 0.012
If your simulation needs just one random variable per simulation function call, use sapply()
, or better yet the multicore
package's mclapply()
. Revolution Analytics's foreach
package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).
Take home points:
- Preallocate with
numeric(n)
or vector('list', n)
- Push invariant code out of
for
loops. Cleverly push stochastic functions out of code with their n
argument.
- Try hard for
sapply()
or lapply()
, or better yet mclapply
.
- Don't use
x <- c(x, rnorm(100))
. Every time you do this, a member of R-core kills a puppy.