tags:

views:

181

answers:

2

Dear StackOverFlowers (flowers in short),

I have a list of data.frames (walk.sample) that I would like to collapse into a single (giant) data.frame. While collapsing, I would like to mark (adding another column) which rows have came from which element of the list. This is what I've got so far.

This is the data.frame that needs to be collapsed/stacked.

> walk.sample
[[1]]
     walker        x         y
1073      3 228.8756 -726.9198
1086      3 226.7393 -722.5561
1081      3 219.8005 -728.3990
1089      3 225.2239 -727.7422
1032      3 233.1753 -731.5526

[[2]]
     walker        x         y
1008      3 205.9104 -775.7488
1022      3 208.3638 -723.8616
1072      3 233.8807 -718.0974
1064      3 217.0028 -689.7917
1026      3 234.1824 -723.7423

[[3]]
[1] 3

[[4]]
     walker        x         y
546       2 629.9041  831.0852
524       2 627.8698  873.3774
578       2 572.3312  838.7587
513       2 633.0598  871.7559
538       2 636.3088  836.6325
1079      3 206.3683 -729.6257
1095      3 239.9884 -748.2637
1005      3 197.2960 -780.4704
1045      3 245.1900 -694.3566
1026      3 234.1824 -723.7423

I have written a function to add a column that denote from which element the rows came followed by appending it to an existing data.frame.

collapseToDataFrame <- function(x) { # collapse list to a dataframe with a twist
    walk.df <- data.frame()
    for (i in 1:length(x)) {
        n.rows <- nrow(x[[i]])
        if (length(x[[i]])>1) {
            temp.df <- cbind(x[[i]], rep(i, n.rows))
            names(temp.df) <- c("walker", "x", "y", "session")
            walk.df <- rbind(walk.df, temp.df)
        } else {
            cat("Empty list", "\n")
        }
    }
    return(walk.df)
}


> collapseToDataFrame(walk.sample)
Empty list 
Empty list 
     walker         x          y session
3         1 -604.5055 -123.18759       1
60        1 -562.0078  -61.24912       1
84        1 -594.4661  -57.20730       1
9         1 -604.2893 -110.09168       1
43        1 -632.2491  -54.52548       1
1028      3  240.3905 -724.67284       1
1040      3  232.5545 -681.61225       1
1073      3  228.8756 -726.91980       1
1091      3  209.0373 -740.96173       1
1036      3  248.7123 -694.47380       1

I'm curious whether this can be done more elegantly, with perhaps do.call() or some other more generic function?

+4  A: 

I'm not claiming this to be the most elegant approach, but I think it is working

library(plyr)

ldply(sapply(1:length(walk.sample), function(i) 
           if (length(walk.sample[[i]]) > 1)
           cbind(walk.sample[[i]],session=rep(i,nrow(walk.sample[[i]])))
      ),rbind)

EDIT

After applying Marek's apt remarks

do.call(rbind,lapply(1:length(walk.sample), function(i)
           if (length(walk.sample[[i]]) > 1)
           cbind(walk.sample[[i]],session=i)  ))
gd047
`cbind` don't need replication, you could just write `session=i`. And without plyr one could use `do.call(rbind, sapply(......))`.
Marek
Hi gd047, I would just like to mention your solution wouldn't work when the data.frame have different number of rows. Also, when the number of rows is the same, the results are not correct (there is a mix with rows and columns. and also there are no column names)
Tal Galili
I think that replace `sapply` with `lapply` may help.
Marek
Good job, chaps! Exactly what the doctor ordered.
Roman Luštrik
+6  A: 

I think this will work...

lengths <- sapply(walk.sample, function(x) if (is.null(nrow(x))) 0 else nrow(x))
cbind(do.call(rbind, walk.sample[lengths > 1]),
      session = rep(1:length(lengths), ifelse(lengths > 1, lengths, 0)))
Jonathan Chang
You should use `NROW` instead of `nrow`. For data from question your solution won't work.
Marek
Good catch, NROW is one possible fix, but I dunno what the expected behavior is when you have a 1-row dataframe. I will change it by doing a NULL check instead...
Jonathan Chang
Good solution Jonathan!
Tal Galili