This may seem as a typical plyr
problem, but I have something different in mind.
Here's the function that I want to optimize (skip the for
loop).
# dummy data
set.seed(1985)
lst <- list(a=1:10, b=11:15, c=16:20)
m <- matrix(round(runif(200, 1, 7)), 10)
m <- as.data.frame(m)
dfsub <- function(dt, lst, fun) {
# check whether dt is `data.frame`
stopifnot (is.data.frame(dt))
# check if vectors in lst are "whole" / integer
# vector elements should be column indexes
is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol
# fall if any non-integers in list
idx <- rapply(lst, is.wholenumber)
stopifnot(idx)
# check for list length
stopifnot(ncol(dt) == length(idx))
# subset the data
subs <- list()
for (i in 1:length(lst)) {
# apply function on each part, by row
subs[[i]] <- apply(dt[ , lst[[i]]], 1, fun)
}
# preserve names
names(subs) <- names(lst)
# convert to data.frame
subs <- as.data.frame(subs)
# guess what =)
return(subs)
}
And now a short demonstration... actually, I'm about to explain what I primarily intended to do. I wanted to subset a data.frame
by vectors gathered in list
object. Since this is a part of code from a function that accompanies data manipulation in psychological research, you can consider m
as a results from personality questionnaire (10 subjects, 20 vars). Vectors in list hold column indexes that define questionnaire subscales (e.g. personality traits). Each subscale is defined by several items (columns in data.frame
). If we presuppose that the score on each subscale is nothing more than sum
(or some other function) of row values (results on that part of questionnaire for each subject), you could run:
> dfsub(m, lst, sum)
a b c
1 46 20 24
2 41 24 21
3 41 13 12
4 37 14 18
5 57 18 25
6 27 18 18
7 28 17 20
8 31 18 23
9 38 14 15
10 41 14 22
I took a glance at this function and I must admit that this little loop isn't spoiling the code at all... BUT, if there's an easier/efficient way of doing this, please, let me know!