views:

567

answers:

3

I'm implementing an algorithm that involves lots of adding and removing things from sets. In R, this is slow because as far as I know, adding or removing things from a vector is slow, since the entire vector has to be re-allocated. Is there a way do do it more efficiently?

Edit: My current solution is to use a boolean vector of the same length as the list of things that can be in the set, and using that as a membership table.

+2  A: 

If you can, initializing a vector so that it has length equal to its maximum length during the algorithm may help.

e.g.

vec <- rep(NA,10)
vec[1:3] <- 1:3
vec[4:5] <- 4:5
vec[6:10] <- 6:10

rather than

vec <- 1:3
vec <- c(vec,4:5)
vec <- c(vec,6:10)

compare

> system.time({vec <- rep(NA,10^4); for (i in 1:(10^4)) vec[i] <- i })
   user  system elapsed 
  0.043   0.001   0.044 

to

> system.time({vec <- NULL; for (i in 1:(10^4)) vec <- c(vec,i) })
   user  system elapsed 
  0.249   0.089   0.335
wkmor1
But I could possibly be removing any item in the vector, not just the last.
Ryan Thompson
Sure. The loops are just examples to show the difference between initializing a vector and building one with repeated assignment. I obviously haven't seen your algorithm but this method should still work though, as 'i' could be any number at any point in your algorithm. It only relies on knowing the length of your vector to begin with. Also, instead of removing it you'd just assign that element as 'NA' and maintain a vector of the same length.
wkmor1
@wkmor1 You don't need to use `eval/parse`, it's enough when you take expression in braces, i.e.: `system.time({vec <- rep(NA,10^4);for (i in 1:(10^4)) vec[i] <- i})`
Marek
@Marek. Fair enough. Thanks for that.
wkmor1
+2  A: 

Chapter 2 of The R inferno has some interesting comments on this, including perdiodic growing objects to reduce memory fragmentation and allocation overhead.

If you know what the ultimate size of the set is, then the method you suggest is probably the best - ie subset from the whole universe using an approprate membership vector. Difficult to know whats best without seeing exactly what you are trying to do though.

James
+1 for referencing The R Inferno
Stedy
I ended up with a boolean vector representing set membership.
Ryan Thompson
A: 

It's hard to say what you want. Maybe you really want stack commands like push and pop. The following isn't that. But it's a fast solution.

Allocate a vector large enough to hold all of your items of the type you need. Set every value to NA. Adding items is simple. Removing items is setting them to NA again. Using the vector is just na.omit(myVec)

myVec <- numeric (maxLength) # a vector of maximum length

is.na(myVec) <- 1:maxLength # set every item in myVec to NA

myVec[c(2,6,20)] <- 5 # add some values

na.omit(myVec)

This will also work if you can initialize all of your values to something that you know you won't need. Then you can

John
An R stack: http://rosettacode.org/wiki/Stack#R
Richie Cotton
@John: `myVec <- rep(NA, maxLength)` replaces your first two lines more cleanly. Also, what's with the big writing?
Richie Cotton