tags:

views:

90

answers:

3

Suppose (to simplify) I have a table containing some control vs. treatment data:

Which, Color, Response, Count
Control, Red, 2, 10
Control, Blue, 3, 20
Treatment, Red, 1, 14
Treatment, Blue, 4, 21

For each color, I want a single row with the control and treatment data, i.e.:

Color, Response.Control, Count.Control, Response.Treatment, Count.Treatment
Red, 2, 10, 1, 14
Blue, 3, 20, 4, 21

I guess one way of doing this is by using an internal merge on each control/treatment subset (merging on the Color column), but is there a better way? I was thinking the reshape package or the stack function could somehow do it, but I'm not sure.

+2  A: 

The cast function from the reshape package (not to be confused with the reshape function in base R) can do this and many other things. See here: http://had.co.nz/reshape/

Zack
Oh, yep, I meant the reshape package. I wasn't sure how to use cast to do what I want, though? One close way is by doing cast(data, position ~ variable | which), but this puts the control/treatment in a list, rather than in the columns.
grautur
I don't remember exactly, but I'm pretty sure the reshape package's documentation has an example of doing exactly this. You have to tell it how you want the column names constructed.
Zack
+4  A: 

Using the reshape package.

First, melt your data.frame:

x <- melt(df) 

Then cast:

cast(x, Color ~ Which + variable)

Voila.

Brandon Bertelsen
A: 

Reshape does indeed work for pivoting a skinny data frame (e.g., from a simple SQL query) to a wide matrix, and is very flexible, but it's slow. For large amounts of data, very very slow. Fortunately, if you only want to pivot to a fixed shape, it's fairly easy to write a little C function to do the pivot fast.

In my case, pivoting a skinny data frame with 3 columns and 672,338 rows took 34 seconds with reshape, 25 seconds with my R code, and 2.3 seconds with C. Ironically, the C implementation was probably easier to write than my (tuned for speed) R implementation.

atp