ansaurus

Question

Answer 1

+1 A:

You want tapply or ave, depending on how you want your output:

> Data <- data.frame(grp=sample(letters[1:3],20,TRUE),x=rnorm(20))
> ave(Data$x, Data$grp)
 [1] -0.3258590 -0.5009832 -0.5009832 -0.2136670 -0.3258590 -0.5009832
 [7] -0.3258590 -0.2136670 -0.3258590 -0.2136670 -0.3258590 -0.3258590
[13] -0.3258590 -0.5009832 -0.2136670 -0.5009832 -0.3258590 -0.2136670
[19] -0.5009832 -0.2136670
> tapply(Data$x, Data$grp, mean)
         a          b          c 
-0.5009832 -0.2136670 -0.3258590 

# Example with more than one column:
> Data <- data.frame(grp=sample(letters[1:3],20,TRUE),x=rnorm(20),y=runif(20))
> do.call(rbind,lapply(split(Data[,-1], Data[,1]), mean))
             x         y
a -0.675195494 0.4772696
b  0.270891403 0.5091359
c  0.002756666 0.4053922

Joshua Ulrich 2010-10-04 19:38:05

Neither of those will do what I want, and are essentially the same thing. In fact the function 'by' which I am using is simply a wrapper for tapply. The idea is that I give a data.frame apply a function over the columns and get a data.frame or matrix back.

Andrew Redd 2010-10-04 19:42:12

My bad. My example only has one column.

Joshua Ulrich 2010-10-04 19:45:38

Answer 2

+2 A:

Does the aggregate function do what you want?

If not, look at the plyr package, it gives several options for taking things apart, doing computations on the pieces, then putting it back together again.

You may also be able to do this using the reshape package.

Greg Snow 2010-10-04 19:46:49

yes aggregate was what I was looking for thank you.

Andrew Redd 2010-10-04 20:10:54

Answer 3

+2 A:

With plyr

library(plyr)
df <- ddply(x, .(id),function(x) data.frame(
mean=mean(x$var)
))
print(df)

Update:

data<-data.frame(I=as.factor(rep(letters[1:10],each=3)),x=rnorm(30),y=rbinom(30,5,.5))
ddply(data,.(I), function(x) data.frame(x=mean(x$x), y=mean(x$y)))

See, plyr is smart :)

Update 2:

In response to your comment, I believe cast and melt from the reshape package are much simpler for your purpose.

cast(melt(data),I ~ variable, mean)

Brandon Bertelsen 2010-10-04 20:49:52

Can this scale to a data.frame with 100 columns? Writing data.frame(x=mean(x$X),...) is not practical. I don't mean to be negative or derogatory, but that is the context of my situation, and so am looking for the best solution that can scale up well.

Andrew Redd 2010-10-05 14:35:29

The answer is yes, you have a whole function to work with inside of ddply. However, I think cast and melt are more efficient for this purpose. I have updated my response.

Brandon Bertelsen 2010-10-05 15:03:16

ansaurus

tags:

views:

answers:

compute means of a group by factor

related questions