ansaurus

Question

vector binding in R

Answer 1

A:

Probably the best thing you can do is preallocate a list of length n (n is number of iterations) and flatten out the list after you're done.

n <- 10
start <- vector("list", n)
for (i in 1:n) {
  a[[i]] <- sample(10)
}
start <- unlist(start)

You could do it the old nasty way. This may be slow for larger vectors.

start <- c()
for (i in 1:n) {
  add <- sample(10)
  start <- c(start, add)
}

Roman Luštrik 2010-08-29 16:39:29

*Don't* **ever** do it this way (the latter way). R doesn't grow vectors geometrically like Python when it runs out of room, so running out of space means (1) reallocation of a vector n+1 elements long, (2) n copy operations, and then (3) it will do this *every* iteration. Always pre-allocate.

Vince 2010-08-29 17:18:54

Agree with Vince. You said yourself to preallocate a list. You should do exactly the same with a vector.

Joris Meys 2010-08-30 07:38:49

I'm just putting out all the options I know. The latter way works for small (sometimes testing) scale purposes. If you're doing something large you need to think about this before hand and working with pre-allocated vectors is actually more readable.

Roman Luštrik 2010-08-31 07:24:10

Answer 2

A:

x <- rnorm(100)
for (i in 100) {
   x <- c(x, rnorm(100))
}

This link should be useful: http://www.milbo.users.sonic.net/ra/

aL3xa 2010-08-29 16:44:36

No: do **not** do it this way: see my comment here: http://stackoverflow.com/questions/3595459/vector-binding-in-r/3595593#3595593

Vince 2010-08-29 18:04:36

And don't use Ra to speed up bad code. It's still experimental and it's cleaner to just write it correctly.

Vince 2010-08-29 18:24:14

OK, thanks for clarifying!

aL3xa 2010-08-29 23:22:35

Answer 3

A:

Assuming your simulation function -- call it func -- returns a vector with the same length each time, you can store the results in the columns of a pre-allocated matrix:

sim1 <- function(reps, func) {
  first <- func()
  result <- matrix(first, nrow=length(first), ncol=reps)
  for (i in seq.int(from=2, to=reps - 1)) {
    result[, i] <- func()
  }
  return(as.vector(result))
}

Or you could express it as follows using replicate:

sim2 <- function(reps, func) {
  return(as.vector(replicate(reps, func(), simplify=TRUE)))
}

> sim2(3, function() 1:3)
[1] 1 2 3 1 2 3 1 2 3

David F 2010-08-29 17:42:04

Answer 4

+6 A:

These answers work, but they all require a call to a non-deterministic function like sample() in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for loop. The trick is to use the n argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n calls, which matters if your n is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)

steps <- 30
n <- 100
directions <- c(-1, 1)

results <- vector('list', n)

for (i in seq_len(n)) {
  walk <- numeric(steps)
  for (s in seq_len(steps)) {
    walk[s] <- sample(directions, 1)
  }
  results[[i]] <- sum(walk)
}

We can rewrite this with one call to sample():

all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)

Proof of speed increase (n=10000):

> system.time({
+ for (i in seq_len(n)) {
+   walk <- numeric(steps)
+   for (s in seq_len(steps)) {
+     walk[s] <- sample(directions, 1)
+   }
+   results[[i]] <- sum(walk)
+ }})
   user  system elapsed 
  4.231   0.332   4.758 

> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
   user  system elapsed 
  0.010   0.001   0.012

If your simulation needs just one random variable per simulation function call, use sapply(), or better yet the multicore package's mclapply(). Revolution Analytics's foreach package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).

Take home points:

Preallocate with numeric(n) or vector('list', n)
Push invariant code out of for loops. Cleverly push stochastic functions out of code with their n argument.
Try hard for sapply() or lapply(), or better yet mclapply.
Don't use x <- c(x, rnorm(100)). Every time you do this, a member of R-core kills a puppy.

Vince 2010-08-29 17:53:20

Awesome answer, especially without having coffee! I assume `all.walk` comes from the lack of coffee... ;-)

Joshua Ulrich 2010-08-29 19:33:39

Thanks Joshua! I corrected the `all.walk` bit.

Vince 2010-08-29 23:43:56

There's another `all.walk` in the third block of code.

Joshua Ulrich 2010-08-30 00:05:54

To make the code reproducable, add set.seed() at the top.

Joris Meys 2010-08-30 07:39:43

You could speed up more by using `rowSums(all.steps)` instead of `apply(all.steps, 1, sum)`.

Marek 2010-08-30 09:37:00

Definitely, thanks for catching this Marek.

Vince 2010-08-30 15:11:51

ansaurus

tags:

views:

answers:

vector binding in R

related questions