tags:

views:

204

answers:

1

I'm trying to understand how to have more than one series on a plot, using the following data.

Year <- c('1950', '1960', '1970', '1980')
Bus <- c(10,20,30,40)
Bus.sd <- c(1.1, 2.2, 3.3, 4.4)
Car <- c(20, 20, 40, 40)
Car.sd <- c(1.1, 2.2, 3.3, 4.4)

sample_data = data.frame(Year, Bus, Bus.sd, Car, Car.sd)

qplot(Year, Bus, data=sample_data, geom="pointrange", 
ymin = Bus - Bus.sd/2, ymax = Bus + Bus.sd/2)

For example, using the above data, how do I show both sample_data$Bus and sample_data$Car on the same plot in different colors?

What I tried doing was:

p <- qplot(...)

then

p <- p + qplot(...) 

where I replicated the previous line, but this gave me an error.

I don't fully understand how AES works. I have studied the ggplot2 examples, but have difficulty understanding the relevant examples here. Or, if it is possible to make a stacked bar (geom_bar) using this data, I think that would also represent it appropriately.

A: 

I Hope this helps

gplot2 works best with data in long format, like so:

  Year score  sd variable
1 1950    10 1.1      bus
2 1960    20 2.2      bus
3 1970    30 3.3      bus
4 1980    40 4.4      bus
5 1950    20 1.1      car
6 1960    20 2.2      car
7 1970    40 3.3      car
8 1980    40 4.4      car

This will get the data into R:

data <- structure(list(Year = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L), class = "factor", .Label = c("1950", "1960", "1970", "1980"
)), score = c(10, 20, 30, 40, 20, 20, 40, 40), sd = c(1.1, 2.2, 
3.3, 4.4, 1.1, 2.2, 3.3, 4.4), variable = c("bus", "bus", "bus", 
"bus", "car", "car", "car", "car")), .Names = c("Year", "score", 
"sd", "variable"), row.names = c(NA, -8L), class = "data.frame")

And this will make the plot, with dodge an all. You properbly need the dodge, because your data is overlapping. You can control the amount of dodging with the "W" value.

ggplot(data, aes(x=Year, y=score,col=variable))+
geom_point(position=position_dodge(w=0.2))+
geom_pointrange(aes(ymin=score-sd, ymax=score+sd,group=Year),position=position_dodge(w=0.2))
Andreas
What does long format mean? Is it like a stacked vector?
celenius
cf. the "data" example. "Bus" and "Car", are not two different variables - but instead two classes of the same "imaginarie" variable, e.g. "vehicle". The same goes for sd and score.
Andreas
Hadleys melt and cast packages are very good at transforming datasets into different forms.
Andreas
Oh, I see. Thanks. I think this is why I have so much trouble understanding the AES part. I don't understand how you structured the data frame though; I'll look at melt and cast.
celenius
I made the data frame manually. my melt and cast skills are somewhat limited, so that was easier in this example. If you are given data in this format, then maybe ask another SO question on how to melt.
Andreas
Since you have overlapping points, you also should look at position_dodge.
Andreas
Just want to second that I think ggplot() is much easier to deal with than qplot() in the long run. qplot() is great for taking a quick peek at very standard stuff (scatterplot! boxplot!), but once you're putting multiple layers on, ggplot() is the way to go.
Matt Parker