tags:

views:

78

answers:

3

I have a list of vectors which are time series of inequal length. My ultimate goal is to plot the time series in a ggplot2 graph. I guess I am better off first merging the vectors in a dataframe (where the shorter vectors will be expanded with NAs), also because I want to export the data in a tabular format such as .csv to be perused by other people.

I have a list that contains the names of all the vectors. It is fine that the column titles be set by the first vector, which is the longest. E.g.:

> mylist
[[1]]
[1] "vector1"

[[2]]
[1] "vector2"

[[3]]
[1] "vector3"

etc.

I know the way to go is to use Hadley's plyr package but I guess the problem is that my list contains the names of the vectors, not the vectors themselves, so if I type:

do.call(rbind, mylist)

I get a one-column df containing the names of the dfs I wanted to merge.

> do.call(rbind, actives)
      [,1]           
 [1,] "vector1" 
 [2,] "vector2" 
 [3,] "vector3" 
 [4,] "vector4" 
 [5,] "vector5" 
 [6,] "vector6" 
 [7,] "vector7" 
 [8,] "vector8" 
 [9,] "vector9" 
[10,] "vector10"

etc.

Even if I create a list with the object themselves, I get an empty dataframe :

mylist <- list(vector1, vector2)
mylist
[[1]]
        1         2         3         4         5         6         7         8         9        10        11        12 
0.1875000 0.2954545 0.3295455 0.2840909 0.3011364 0.3863636 0.3863636 0.3295455 0.2954545 0.3295455 0.3238636 0.2443182 
       13        14        15        16        17        18        19        20        21        22        23        24 
0.2386364 0.2386364 0.3238636 0.2784091 0.3181818 0.3238636 0.3693182 0.3579545 0.2954545 0.3125000 0.3068182 0.3125000 
       25        26        27        28        29        30        31        32        33        34        35        36 
0.2727273 0.2897727 0.2897727 0.2727273 0.2840909 0.3352273 0.3181818 0.3181818 0.3409091 0.3465909 0.3238636 0.3125000 
       37        38        39        40        41        42        43        44        45        46        47        48 
0.3125000 0.3068182 0.2897727 0.2727273 0.2840909 0.3011364 0.3181818 0.2329545 0.3068182 0.2386364 0.2556818 0.2215909 
       49        50        51        52        53        54        55        56        57        58        59        60 
0.2784091 0.2784091 0.2613636 0.2329545 0.2443182 0.2727273 0.2784091 0.2727273 0.2556818 0.2500000 0.2159091 0.2329545 
       61 
0.2556818 

[[2]]
        1         2         3         4         5         6         7         8         9        10        11        12 
0.2824427 0.3664122 0.3053435 0.3091603 0.3435115 0.3244275 0.3320611 0.3129771 0.3091603 0.3129771 0.2519084 0.2557252 
       13        14        15        16        17        18        19        20        21        22        23        24 
0.2595420 0.2671756 0.2748092 0.2633588 0.2862595 0.3549618 0.2786260 0.2633588 0.2938931 0.2900763 0.2480916 0.2748092 
       25        26        27        28        29        30        31        32        33        34        35        36 
0.2786260 0.2862595 0.2862595 0.2709924 0.2748092 0.3396947 0.2977099 0.2977099 0.2824427 0.3053435 0.3129771 0.2977099 
       37        38        39        40        41        42        43        44        45        46        47        48 
0.3320611 0.3053435 0.2709924 0.2671756 0.2786260 0.3015267 0.2824427 0.2786260 0.2595420 0.2595420 0.2442748 0.2099237 
       49        50        51        52        53        54        55        56        57        58        59        60 
0.2022901 0.2251908 0.2099237 0.2213740 0.2213740 0.2480916 0.2366412 0.2251908 0.2442748 0.2022901 0.1793893 0.2022901 

but

do.call(rbind.fill, mylist)
data frame with 0 columns and 0 rows

I have tried converting the vectors to dataframes, but there is no cbind.fill function, so plyr complains that the dataframes are of different length.

So my questions are:

  • Is this the best approach? Keep in mind that the goals are a) a ggplot2 graph and b) a table with the time series, to be viewed outside of R

  • What is the best way to get a list of objects starting with a list of the names of those objects?

  • What the best type of graph to highlight the patterns of 60 timeseries? The scale is the same, but I predict there'll be a lot of overplotting. Since this is a cohort analysis, it might be useful to use color to highlight the different cohorts in terms of recency (as a continuous variable). But how to avoid overplotting? The differences will be minimal so faceting might leave the viewer unable to grasp the difference.

+1  A: 

You can't. A data.frame() has to be rectangular; but recycling rules assures that the shorter vectors get expanded.

So you may have a different error here -- the data that you want to rbind is not suitable, maybe ? -- but is hard to tell as you did not supply a reproducible example.

Edit Given your update, you get precisely what you asked for: a list of names gets combined by rbind. If you want the underlying data to appear, you need to involve get() or another data accessor.

Dirk Eddelbuettel
I edited the question stating my goal and giving an example
Roberto
+3  A: 

I think that you may be approaching this the wrong way:

If you have time series of unequal length then the absolute best thing to do is to keep them as time series and merge them. Most time series packages allow this. So you will end up with a multi-variate time series and each value will be properly associated with the same date.

So put your time series into zoo objects, merge them, then use my qplot.zoo function to plot them. That will deal with switching from zoo into a long data frame.

Here's an example:

> z1 <- zoo(1:8, 1:8)
> z2 <- zoo(2:8, 2:8)
> z3 <- zoo(4:8, 4:8)
> nm <- list("z1", "z2", "z3")
> z <- zoo()
> for(i in 1:length(nm)) z <- merge(z, get(nm[[i]]))
> names(z) <- unlist(nm)
> z
  z1 z2 z3
1  1 NA NA
2  2  2 NA
3  3  3 NA
4  4  4  4
5  5  5  5
6  6  6  6
7  7  7  7
8  8  8  8
> 
> x.df <- data.frame(dates=index(x), coredata(x))
> x.df <- melt(x.df, id="dates", variable="val")
> ggplot(na.omit(x.df), aes(x=dates, y=value, group=val, colour=val)) + geom_line() + opts(legend.position = "none")
Shane
Shane: thanks for providing an alternative approach. What's the sytax to merge the object when I have a list with all their names?
Roberto
Also I should add the vector names are not dates, but from a quick look at the documentation, I think zoo is agnostic to that
Roberto
+1  A: 

If you're doing it just because ggplot2 (as well as many other things) like data frames then what you're missing is that you need the data in long format data frames. Yes, you just put all of your response variables in one column concatenated together. Then you would have 1 or more other columns that identify what makes those responses different. That's the best way to have it set up for things like ggplot.

John