tags:

views:

433

answers:

3

In R, given a vector

casp6 <- c(0.9478638, 0.7477657, 0.9742675, 0.9008372, 0.4873001, 0.5097587, 0.6476510, 0.4552577, 0.5578296, 0.5728478, 0.1927945, 0.2624068, 0.2732615)

and a factor:

trans.factor <- factor (rep (c("t0", "t12", "t24", "t72"), c(4,3,3,3)))

I want to create a plot where the data points are grouped as defined by the factor. So the categories should be on the x-axis, values in the same category should have the same x coordinate.

Simply doing plot(trans.factor, casp6) does almost what I want, it produces a boxplot, but I want to see the individual data points.

+3  A: 
require(ggplot2)
qplot(trans.factor, casp6)
Jonathan Chang
Short and sweet, thanks :)
amarillion
+1  A: 

You can do it with ggplot2, using facets. When I read "I want to create a plot where the data points are grouped as defined by the factor", the first thing that came to my mind was facets.

But in this particular case, faster alternative should be:

plot(as.numeric(trans.factor), casp6)

And you can play with plot options afterwards (type, fg, bg...), but I recommend sticking with ggplot2, since it has much cleaner code, great functionality, you can avoid overplotting... etc. etc.

Learn how to deal with factors. You got barplot when evaluating plot(trans.factor, casp6) 'cause trans.factor was class of factor (ironically, you even named it in such manor)... and trans.factor, as such, was declared before a continuous (numeric) variable within plot() function... hence plot() "feels" the need to subset data and draw boxplot based on each part (if you declare continuous variable first, you'll get an ordinary graph, right?). ggplot2, on the other hand, interprets factor in a different way... as "an ordinary", numeric variable (this stands for syntax provided by Jonathan Chang, you must specify geom when doing something more complex in ggplot2).

But, let's presuppose that you have one continuous variable and a factor, and you want to apply histogram on each part of continuous variable, defined by factor levels. This is where the things become complicated with base graph capabilities.

# create dummy data
> set.seed(23)
> x <- rnorm(200, 23, 2.3)
> g <- factor(round(runif(200, 1, 4)))

By using base graphs (package:graphics):

par(mfrow = c(1, 4))
tapply(x, g, hist)

ggplot2 way:

qplot(x, facets = . ~ g)

Try to do this with graphics in one line of code (semicolons and custom functions are considered cheating!):

qplot(x, log(x), facets = . ~ g)

Let's hope that I haven't bored you to death, but helped you!

Kind regards,
aL3xa

aL3xa
A: 

You may be able to get close to what you want using lattice graphics by doing:

library(lattice)    
xyplot(casp6 ~ trans.factor, 
       scales = list(x = list(at = 1:4, labels = levels(trans.factor))))
Greg