tags:

views:

217

answers:

1

I need to summarize a data frame by some variables, ignoring the others. This is sometimes referred to as collapsing. E.g. if I have a dataframe like this:
Widget Type Energy
egg 1 20
egg 2 30
jap 3 50
jap 1 60

Then collapsing by Widget, with Energy the dependent variable, Energy~Widget, would yield
Widget Energy
egg 25
jap 55

In Excel the closest functionality might be "Pivot tables" and I've worked out how to do it in python (http://alexholcombe.wordpress.com/2009/01/26/summarizing-data-by-combinations-of-variables-with-python/), and here's an example with R using doBy library to do something very related (http://www.mail-archive.com/[email protected]/msg02643.html), but is there an easy way to do the above? And even better is there anything built into the ggplot2 library to create plots that collapse across some variables?

+6  A: 

Use aggregate to summarize across a factor:

> df<-read.table(textConnection('
+ egg 1 20
+ egg 2 30
+ jap 3 50
+ jap 1 60'))
> aggregate(df$V3,list(df$V1),mean)
  Group.1  x
1     egg 25
2     jap 55

For more flexibility look at the tapply function and the plyr package.

In ggplot2 use stat_summary to summarize

qplot(V1,V3,data=df,stat="summary",fun.y=mean,geom='bar',width=0.4)
Jyotirmoy Bhattacharya
`by` also comes in handy from time to time.
Jonathan Chang
true. also `ave`
Jyotirmoy Bhattacharya
for an example using `ddply` from the `plyr` package take a look at this related question: http://stackoverflow.com/questions/2473659/r-what-are-the-best-functions-to-deal-with-concatenating-and-averaging-values-in
mropa
i would check out plyr for a general purpose SAC combine framework (what pivot tables are), it's an excellent resource
Dan
Thanks! it's taken me awhile to get my head around ggplot2 stat. Here's what I wanted exactly, I think: ggplot(data=df,aes(x=Widget,y=Energy))+ stat_summary(fun.y=mean,ymin=10,ymax=60)
Alex Holcombe
Do you really want the long line? Otherwise `stat_summary(fun.y=mean,geom='point')` produces just the points.
Jyotirmoy Bhattacharya
@jmoy: Thanks, I was wondering how to get rid of the lines!
Alex Holcombe