views:

1579

answers:

2

We all love robust measures like medians and interquartile ranges, but lets face it, in many fields, boxplots almost never show up in published articles, while means and standard errors do so all the time.

It's simple in lattice, ggplot2, etc to draw boxplots and the galleries are full of them. Is there an equally straightforward way to draw means and standard errors, conditioned by a categorical variable?

I'm taking about plots like these:

http://freakonomics.blogs.nytimes.com/2008/07/30/how-big-is-your-halo-a-guest-post/

Or what are called "means diamonds" in JMP (see Figure 3):

http://blogs.sas.com/jmp/index.php?/archives/127-What-Good-Are-Error-Bars.html

+6  A: 
Shane
you just beat me to this one! I read the www.imachordata.com post yesterday and even emailed it to a former coworker.
JD Long
It's a small world in the R blogosphere. :) I recently started following planet R (http://planetr.stderr.org/), and it's a bit overwhelming.
Shane
I need to stop being lazy and start maintaining an R blog list.
JD Long
Pretty good answer, though those are SDs not SEs. It's a pity the "bar w/ SE plot" can't be drawn in one straightforward call like the boxplot can.
Dan Goldstein
That's a good point about the SD/SE (I was just showing how to plot it). If you look at the geom_errorbar documentation, you will see that it doesn't take too many steps to produce. Incidentally, I don't see any evidence of R being able to produce a "means diamonds" right now.
Shane
A: 

ggplot produces aesthetically pleasing graphs, but I don't have the gumption to try and publish any ggplot output yet.

Until the day comes, here is how I have been making the aforementioned graphs. I use a graphics package called 'gplots' in order to get the standard error bars (using data I've calculated already). Note that this code provides for two or more factors for each class/category. This requires the data to go in as a matrix and for the "beside=TRUE" command in the "barplot2" function to keep the bars from being stacked.

# Create the data (means) matrix
# Using the matrix accommodates two or more factors for each class

data.m <- matrix(c(75,34,19, 39,90,41), nrow = 2, ncol=3, byrow=TRUE,
               dimnames = list(c("Factor 1", "Factor 2"),
                                c("Class A", "Class B", "Class C")))

# Create the standard error matrix

error.m <- matrix(c(12,10,7, 4,7,3), nrow = 2, ncol = 3, byrow=TRUE)

# Join the data and s.e. matrices into a data frame

data.fr <- data.frame(data.m, error.m) 

# load library {gplots}

library(gplots)

# Plot the bar graph, with standard errors

with(data.fr,
     barplot2(data.m, beside=TRUE, axes=T, las=1, ylim = c(0,120),  
                main=" ", sub=" ", col=c("gray20",0),
                    xlab="Class", ylab="Total amount (Mean +/- s.e.)",
                plot.ci=TRUE, ci.u=data.m+error.m, ci.l=data.m-error.m, ci.lty=1))

# Now, give it a legend:

legend("topright", c("Factor 1", "Factor 2"), fill=c("gray20",0),box.lty=0)

It is pretty plain-Jane, aesthetically, but seems to be what most journals/old professors want to see.

I'd post the graph produced by these example data, but this is my first post on the site. Sorry. One should be able to copy-paste the whole thing (after installing the "gplots" package) without problem.

devanmcg