tags:

views:

256

answers:

2

Hadley turned me on to the plyr package and I find myself using it all the time to do 'group by' sort of stuff. But I find myself having to always rename the resulting columns since they default to V1, V2, etc.

Here's an example:

mydata<-data.frame(matrix(rnorm(144, mean=2, sd=2),72,2),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(mydata) <- c("x_value", "acres",  "state")
groupAcres <- ddply(mydata, c("state"), function(df)c(sum(df$acres)))
colnames(groupAcres) <- c("state","stateAcres")

Is there a way to make ddply name the resulting column for me so I can omit that last line?

+3  A: 

This seems to work:

> groupAcres <- ddply(mydata, c("state"), function(df) c(myName=sum(df$acres)))
> groupAcres
  state   myName
1     A 56.87973
2     B 57.84451
3     C 52.82415
Christopher DuBois
I muddle through R syntax without really understanding it. Why on earth does one need the concatenate function?
Farrel
+6  A: 

Use summarise (or summarize):

  groupAcres <- ddply(mydata, "state", summarise, 
     myName = sum(acres))
hadley
that is an excellent way to solve this. I chose Chris' answer only because it's more general. I'll use both his method and yours in the future. I wish I could combine them or accept them both
JD Long
My method is actually slightly more general (because if you return multiple types they can have different types). I wrote summarise for exactly this use.
hadley