views:

376

answers:

2

Is there a way to create a boxplot in R that will display with the box (somewhere) an "N=(sample size)"? The varwidth logical adjusts the width of the box on the basis of sample size, but that doesn't allow comparisons between different plots.

FWIW, I am using the boxplot command in the following fashion, where 'f1' is a factor:

boxplot(xvar ~ f1, data=frame, xlab="input values", horizontal=TRUE)
+2  A: 

You can use the names parameter to write the n next to each factor name.

If you don't want to calculate the n yourself you could use this little trick:

# Do the boxplot but do not show it
b <- boxplot(xvar ~ f1, data=frame, plot=0)
# Now b$n holds the counts for each factor, we're going to write them in names
boxplot(xvar ~ f1, data=frame, xlab="input values", names=paste(b$names, "(n=", b$n, ")"))
nico
Pretty slick! Thanks for the trick.
J Miller
+2  A: 

Here's some ggplot2 code. It's going to display the sample size at the sample mean, making the label multifunctional!

First, a simple function for fun.data

give.n <- function(x){
   return(c(y = mean(x), label = length(x)))
}

Now, to demonstrate with the diamonds data

ggplot(diamonds, aes(cut, price)) + 
   geom_boxplot() + 
   stat_summary(fun.data = give.n, geom = "text")

You may have to play with the text size to make it look good, but now you have a label for the sample size which also gives a sense of the skew.

JoFrhwld
Works great, and looks beautiful. Thanks!
J Miller
What if I'm ggplot-ing with `geom_boxplot(aes(fill=factor(f2)))` where f2 is a second factor - is there a variation on stat_summary that allows for the 'sub boxes' to receive their own N?
J Miller
Example code to save space: `ggplot(mpg, aes(manufacturer, hwy, fill = factor(year))) + geom_boxplot() + stat_summary(fun.data = give.n, geom = "text", position = position_dodge(height = 0, width = 0.75), size = 3)` You may have to manually adjust the value passed to `width` in `position_dodge()`
JoFrhwld