ansaurus

Question

Subset a data.frame by list and apply function on each part, by rows

Answer 1

+2 A:

after loading the plyr package, replace

subs <- list()
    for (i in 1:length(lst)) {
            # apply function on each part, by row
            subs[[i]] <- apply(dt[ , lst[[i]]], 1, fun)
    }

with

subs <- llply(lst,function(x) apply(dt[,x],1,fun))

gd047 2010-02-28 14:24:07

Thanks for reply! Well, `llply` approach did shorten the code a bit, but previous function has certain "leverage" - it depends only on `base` package. I've stated a trivial leverage, 'cause the first packages that I install are `plyr` and `reshape`.

aL3xa 2010-02-28 15:34:07

Oh, I misunderstood! Thought you wanted to use plyr. You have just to use lapply instead of llply:subs <- lapply(lst,function(x) apply(dt[,x],1,fun))

gd047 2010-02-28 18:26:27

No, you got it right! It's only matter of preference... I figured out that I must use `lapply`... `sapply` gives character vectors as output.

aL3xa 2010-02-28 20:01:52

Answer 2

+3 A:

I'd take a different approach and keep everything as data frames so that you can use merge and ddply. I think you'll find this approach is a little more general, and it's easier to check that each step is performed correctly.

# Convert everything to long data frames
m$id <- 1:nrow(m)

library(reshape)
obs <- melt(m, id = "id")
obs$variable <- as.numeric(gsub("V", "", obs$variable))

varinfo <- melt(lst)
names(varinfo) <- c("variable", "scale")

# Merge and summarise
obs <- merge(obs, varinfo, by = "variable")

ddply(obs, c("id", "scale"), summarise, 
  mean = mean(value), 
  sum = sum(value))

hadley 2010-02-28 15:31:01

Answer 3

A:

@Hadley, I've checked your response since it's quite straightforward and easy for bookkeeping (besides the fact it's more general-purpose-solution). However, here's my not-so-long script that does the thing and requires only base package (which is trivial since I install plyr and reshape just after installing R). Now, here's the source:

dfsub <- function(dt, lst, fun) {
        # check whether dt is `data.frame`
        stopifnot (is.data.frame(dt))
        # convert data.frame factors to numeric
        dt <- as.data.frame(lapply(dt, as.numeric))
        # check if vectors in lst are "whole" / integer
        # vector elements should be column indexes
        is.wholenumber <- function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x)) < tol
        # fall if any non-integers in list
        idx <- rapply(lst, is.wholenumber)
        stopifnot(idx)
        # check for list length
        stopifnot(ncol(dt) == length(idx))
        # subset the data
        subs <- list()
        for (i in 1:length(lst)) {
                # apply function on each part, by row
                subs[[i]] <- apply(dt[ , lst[[i]]], 1, fun)
        }
        names(subs) <- names(lst)
        # convert to data.frame
        subs <- as.data.frame(subs)
        # guess what =)
        return(subs)
}

aL3xa 2010-03-13 11:06:40

ansaurus

tags:

views:

answers:

Subset a data.frame by list and apply function on each part, by rows

related questions