tags:

views:

97

answers:

4

I have a data frame that I would like to merge from long to wide format, but I would like to have the time embedded into the variable name in the wide format. Here is an example data set with the long format:

id <- as.numeric(rep(1,16))
time <- rep(c(5,10,15,20), 4)
varname <- c(rep("var1",4), rep("var2", 4), rep("var3", 4), rep("var4", 4))
value <- rnorm(16)
tmpdata <- as.data.frame(cbind(id, time, varname, value))

> tmpdata
id time varname              value
1    5    var1  0.713888426169224
1   10    var1   1.71483653545922
1   15    var1  -1.51992072577836
1   20    var1  0.556992407683219
....
4   20    var4   1.03752019932467

I would like this in a wide format with the following output:

id var1.5 var1.10 var1.15 var1.20 ....
1  0.71   1.71    -1.51   0.55 

(and so on)

I've tried using reshape function in base R without success, and I was not sure how to accomplish this using the reshape package, as all of the examples put time as another variable in the wide format. Any ideas?

+1  A: 

Why not just paste varname and time together before you reshape?

frankc
+1  A: 

I had to do it in two reshape steps. The row headings may not be exactly what you needed, but can be renamed easily.

id <- as.numeric(rep(1, 16))
time <- rep(c(5,10,15,20), 4)
varname <- c(rep("var1",4), rep("var2", 4), rep("var3", 4), rep("var4", 4))
value <- rnorm(16)
tmpdata <- as.data.frame(cbind(id, time, varname, value))

first <- reshape(tmpdata, timevar="time", idvar=c("id", "varname"), direction="wide")
second <- reshape(first, timevar="varname", idvar="id", direction="wide") 

And the output:

> tmpdata
   id time varname               value
1   1    5    var1  -0.231227494628982
2   1   10    var1   -1.80887236653438
3   1   15    var1  -0.443229294431553
4   1   20    var1    1.33719337048763
5   1    5    var2   0.673109282347586
6   1   10    var2   -0.42142267953938
7   1   15    var2   0.874367622725874
8   1   20    var2   -1.19917678039462
9   1    5    var3    1.13495606258399
10  1   10    var3 -0.0779385346672042
11  1   15    var3  -0.126775240288037
12  1   20    var3  -0.760739300144526
13  1    5    var4   -1.94626587907069
14  1   10    var4    1.25643195699455
15  1   15    var4   -0.50986941213717
16  1   20    var4   -1.01324846239812
> first
   id varname            value.5            value.10           value.15
1   1    var1 -0.231227494628982   -1.80887236653438 -0.443229294431553
5   1    var2  0.673109282347586   -0.42142267953938  0.874367622725874
9   1    var3   1.13495606258399 -0.0779385346672042 -0.126775240288037
13  1    var4  -1.94626587907069    1.25643195699455  -0.50986941213717
             value.20
1    1.33719337048763
5   -1.19917678039462
9  -0.760739300144526
13  -1.01324846239812
> second
  id       value.5.var1     value.10.var1      value.15.var1    value.20.var1
1  1 -0.231227494628982 -1.80887236653438 -0.443229294431553 1.33719337048763
       value.5.var2     value.10.var2     value.15.var2     value.20.var2
1 0.673109282347586 -0.42142267953938 0.874367622725874 -1.19917678039462
      value.5.var3       value.10.var3      value.15.var3      value.20.var3
1 1.13495606258399 -0.0779385346672042 -0.126775240288037 -0.760739300144526
       value.5.var4    value.10.var4     value.15.var4     value.20.var4
1 -1.94626587907069 1.25643195699455 -0.50986941213717 -1.01324846239812
richardh
You may also want to check out Hadley Wickham's `Reshape` package (I have never used it).
richardh
Thank you richardh, your solution worked but I accepted Hadley's code using the reshape package because the new variables names are exactly the way I wanted (var1_5, var1_10, etc) without having additional lines of code to rename the variable names to the desired format.
sheed03
@sheed03 -- No worries. Hadley's way is mos def the way to do it. But I noticed that his changes the order of the columns (i.e., puts the time 5 value to the far right), so make sure you take a look at the output.
richardh
+1  A: 

I gave up on the reshape() command 2 years ago. It seems figuring that damn thing out each time was actually harder than just doing it the 'hard' way, which is much more flexible.

Your data in your example are all nicely sorted. You might have to sort your real data by var name and time first.

(renamed your tmpdata to tmp, made value numeric)

y <- lapply(split(tmp, tmp$id), function(x) x$value)
df <- data.frame(unique(tmp$id,), do.call(rbind,y))
names(df) <- c('id', as.character(tmp$time:tmp$var))
John
+5  A: 

This is trivial with the reshape package:

library(reshape)
cast(tmpdata, ... ~ varname + time)
hadley
Thank you Hadley, your code does exactly what I am looking for. For my reference, I replaced the ... with id so I can remember this for future examples.
sheed03
In this context `...` means all other variables not already included in the cast specification. You shouldn't need to replace it with actual variable names, unless you are doing aggregation.
hadley