...regarding execution time and / or memory.
If this is not true, prove it with a code snippet. Note that speedup by vectorization does not count. The speedup most come from *apply (tapply, sapply ...) itself.
Thanks in advance
...regarding execution time and / or memory.
If this is not true, prove it with a code snippet. Note that speedup by vectorization does not count. The speedup most come from *apply (tapply, sapply ...) itself.
Thanks in advance
Sometimes speedup can be substantial, like when you have to nest for-loops to get the average based on a grouping of more than one factor. Here you have two approaches that give you the exact same result :
set.seed(1) #for reproducability of the results
# The data
X <- rnorm(100000)
Y <- as.factor(sample(letters[1:5],100000,replace=T))
Z <- as.factor(sample(letters[1:10],100000,replace=T))
# the function forloop that averages X over every combination of Y and Z
forloop <- function(x,y,z){
# These ones are for optimization, so the functions
#levels() and length() don't have to be called more than once.
ylev <- levels(y)
zlev <- levels(z)
n <- length(ylev)
p <- length(zlev)
out <- matrix(NA,ncol=p,nrow=n)
for(i in 1:n){
for(j in 1:p){
out[i,j] <- (mean(x[y==ylev[i] & z==zlev[j]]))
}
}
rownames(out) <- ylev
colnames(out) <- zlev
return(out)
}
# Used on the generated data
forloop(X,Y,Z)
# The same using tapply
tapply(X,list(Y,Z),mean)
Both give exactly the same result, being a 5 x 10 matrix with the averages and named rows and columns. But :
> system.time(forloop(X,Y,Z))
user system elapsed
0.94 0.02 0.95
> system.time(tapply(X,list(Y,Z),mean))
user system elapsed
0.06 0.00 0.06
There you go. What did I win? ;-)