tags:

views:

68

answers:

2

Many hours of manic googling and leafing through ggplot2 documentation having brought me no closer, I was hoping someone could maybe nudge me in the right direction.

I have cell count data for a few thousand subjects in a data.frame with the following layout:

  • 1 subject per row.
  • 1 column per cell type (5 total, each holding the percentage value for that cell type, summing to 100%).
  • 2 extra columns, one to indicate what Group (experimental or control) the subjects belong to, 1 to indicate what experiment they belong to (1, 2, 3, 4, etc.)

I would like to generate a ggplot2 jitter plot, percentage along the Y-axis, cell type categories along the X-axis (5 total) and further color the data points based on their Group (experimental or control). It would be great if I could further color the data points from different experiments in shades of the Group color (i.e. Experiment number sort of defining a gradient from light to dark - all Experiment-1 points would be light - either red or blue based on which Group they belonged to), but I don't know if that's even possible.

For starters: is my data even layed out properly to attempt to create this plot? The reason I ask is I reall yfeel like I'm fighting ggplot2 in attempting to get anything plotted with the data.frame in its current layout (but the native boxplot() seems to work fine with very little modifications...)

Any help or nudges in the right direction would be greatly appreciated.


EDIT:

This is the output of dput(head(dat, 10)).

structure(list(Neutrophils = c(38, 70.7, 62.1, 90.5, 65.8, 39.2, 89.4, 91.3, 55.4, 14.5), Lymphocytes = c(47.5, 17.1, 20.3, 2, 25, 37.1, 6.3, 1.6, 31.3, 61.5), Monocytes = c(12.4, 11.8, 14.6, 4.8, 7.3, 14.1, 3.7, 4.6, 8.4, 21.9), Eosinophils = c(1.4, 0.1, 2.5, 2.4, 1.4, 9.2, 0.1, 2.5, 4.6, 1.3), Basophils = c(0.8, 0.3, 0.5, 0.3, 0.5, 0.4, 0.5, 0, 0.3, 0.8), Group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("Neutrophils", "Lymphocytes", "Monocytes", "Eosinophils", "Basophils", "Group"), row.names = c("B145", "B196", "B212", "B246", "B250", "B286", "B343", "B355", "B369", "B386"), class = "data.frame")

A: 

You first need to reshape your data using the melt function in the reshape package.

I'm sure someone will come along with a more elegant way to color points on a gradient but you can do that manually by creating a new column with the colors matched to group and/or experiment. Then map that color aesthetic to that column.

Maiasaura
Wow, I'm actually amazed at how close that got me. Thank you so much. Melt pretty much did what I'd been trying to do (unsuccessfully) for a few hours now - namely reorganize the data.frame to a format qplot would be happy with.Still need to figure out the coloring, but I'm thinking that should be easier now that I see the "melt-ed" layout.
alsocasey
The coloring is possible too, I think.First create a set of colors mapped to your experiments (say you have 3 experiments).colors=cbind(colors=c("red","blue","yellow"),experiment=1:3)Then merge that your datasetnew_data=merge(data,colors,by="experiment")You could make that previous step more complicated by having a group by experiment combo.Now in ggplot, just specify colours=color as part of the aesthetic.example: ggplot(data,aes(x=cell,y=value,colours=colors))
Maiasaura
A: 

It would be great if I could further color the data points from different experiments in shades of the Group color (i.e. Experiment number sort of defining a gradient from light to dark - all Experiment-1 points would be light - either red or blue based on which Group they belonged to), but I don't know if that's even possible.

Not currently, sorry.

hadley
Ah, well there goes the final hurdle I was up against - I suppose I could create 1 plot per experiment and overlay them, using light to dark shades of both group colours in successive plots to achieve a similar result?
alsocasey
Why not just use shape? Or you could create your own colour scheme with scale colour manual.
hadley
How about using alpha transparancy? Just need to scale the experiment number appropriately (divide by the max should be fine)
James
In fact, you don't even need to bother scaling, just add `alpha=experiment` to your qplot call.
James
This works and essentially achieves what I was looking for (Thank you!). Failing at finding clever ways to skew the alpha towards the opaque side though - with 4 experiment types, 1 and 2 exp points sit at 25% and 50%, which is a little on the light side.
alsocasey