tags:

views:

106

answers:

4

I would like to implement a simulation program, which requires the following structure:

It has a for loop, the program will generate an vector in each iteration. I need each generated vector is appended to the existing vector.

I do not how how to do this in R. Thanks for the help.

A: 

Probably the best thing you can do is preallocate a list of length n (n is number of iterations) and flatten out the list after you're done.

n <- 10
start <- vector("list", n)
for (i in 1:n) {
  a[[i]] <- sample(10)
}
start <- unlist(start)

You could do it the old nasty way. This may be slow for larger vectors.

start <- c()
for (i in 1:n) {
  add <- sample(10)
  start <- c(start, add)
}
Roman Luštrik
*Don't* **ever** do it this way (the latter way). R doesn't grow vectors geometrically like Python when it runs out of room, so running out of space means (1) reallocation of a vector n+1 elements long, (2) n copy operations, and then (3) it will do this *every* iteration. Always pre-allocate.
Vince
Agree with Vince. You said yourself to preallocate a list. You should do exactly the same with a vector.
Joris Meys
I'm just putting out all the options I know. The latter way works for small (sometimes testing) scale purposes. If you're doing something large you need to think about this before hand and working with pre-allocated vectors is actually more readable.
Roman Luštrik
A: 
x <- rnorm(100)
for (i in 100) {
   x <- c(x, rnorm(100))
}

This link should be useful: http://www.milbo.users.sonic.net/ra/

aL3xa
No: do **not** do it this way: see my comment here: http://stackoverflow.com/questions/3595459/vector-binding-in-r/3595593#3595593
Vince
And don't use Ra to speed up bad code. It's still experimental and it's cleaner to just write it correctly.
Vince
OK, thanks for clarifying!
aL3xa
A: 

Assuming your simulation function -- call it func -- returns a vector with the same length each time, you can store the results in the columns of a pre-allocated matrix:

sim1 <- function(reps, func) {
  first <- func()
  result <- matrix(first, nrow=length(first), ncol=reps)
  for (i in seq.int(from=2, to=reps - 1)) {
    result[, i] <- func()
  }
  return(as.vector(result))
}

Or you could express it as follows using replicate:

sim2 <- function(reps, func) {
  return(as.vector(replicate(reps, func(), simplify=TRUE)))
}

> sim2(3, function() 1:3)
[1] 1 2 3 1 2 3 1 2 3
David F
+6  A: 

These answers work, but they all require a call to a non-deterministic function like sample() in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for loop. The trick is to use the n argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n calls, which matters if your n is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)

steps <- 30
n <- 100
directions <- c(-1, 1)

results <- vector('list', n)

for (i in seq_len(n)) {
  walk <- numeric(steps)
  for (s in seq_len(steps)) {
    walk[s] <- sample(directions, 1)
  }
  results[[i]] <- sum(walk)
}

We can rewrite this with one call to sample():

all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)

Proof of speed increase (n=10000):

> system.time({
+ for (i in seq_len(n)) {
+   walk <- numeric(steps)
+   for (s in seq_len(steps)) {
+     walk[s] <- sample(directions, 1)
+   }
+   results[[i]] <- sum(walk)
+ }})
   user  system elapsed 
  4.231   0.332   4.758 

> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
   user  system elapsed 
  0.010   0.001   0.012 

If your simulation needs just one random variable per simulation function call, use sapply(), or better yet the multicore package's mclapply(). Revolution Analytics's foreach package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).

Take home points:

  • Preallocate with numeric(n) or vector('list', n)
  • Push invariant code out of for loops. Cleverly push stochastic functions out of code with their n argument.
  • Try hard for sapply() or lapply(), or better yet mclapply.
  • Don't use x <- c(x, rnorm(100)). Every time you do this, a member of R-core kills a puppy.
Vince
Awesome answer, especially without having coffee! I assume `all.walk` comes from the lack of coffee... ;-)
Joshua Ulrich
Thanks Joshua! I corrected the `all.walk` bit.
Vince
There's another `all.walk` in the third block of code.
Joshua Ulrich
To make the code reproducable, add set.seed() at the top.
Joris Meys
You could speed up more by using `rowSums(all.steps)` instead of `apply(all.steps, 1, sum)`.
Marek
Definitely, thanks for catching this Marek.
Vince