tags:

views:

75

answers:

1

I have a data frame with gaps like this:

           Var1    Var2    Var3
1            NA      NA      NA
2            NA      NA      NA
3            NA      NA      NA
4            NA 0.06703      NA
5            NA 0.08639      NA
6            NA 0.19023 0.02322
7            NA 0.31764 0.08058
8            NA 0.44426 0.15081
9            NA 0.37529 0.29595
10           NA 0.40029 0.29274
11           NA 0.33828 0.39168
12      0.01595 0.31432 0.43192
13      0.05217 0.28560 0.48150
14      0.07196 0.32588 0.56065
15      0.08771 0.26301 0.68131

When I run melt(), I remove the NA cells: melt(df, na.rm = TRUE), but I would like to add a new column that contains a row number for each value's position within its variable group.

So my results look like this:

variable    value
    Var1   0.01595
    Var1   0.05217
    Var1   0.07196
    Var1   0.08771
    Var2   0.06703
    Var2   0.08639
...etc

and I want them to look like this:

variable    value    index
    Var1   0.01595   1
    Var1   0.05217   2
    Var1   0.07196   3
    Var1   0.08771   4
    Var2   0.06703   1
    Var2   0.08639   2
...etc

What's the best way to generate these internal row numbers, whether before, during, or after the melt process?

+2  A: 

Take a look at this previous question about auto incremented cohort counts. I think this is what you are wanting to do. If so, probably easiest (at least for me) to do this as a separate operation with plyr.

Here's the gist:

ddply(df, .(variable), function(x) data.frame(x, NewID=1:nrow(x))) 
JD Long
That thread definitely has the answer. Thanks.
MW Frost
For posterity: `ddply(df, .(variable), function(x) data.frame(x, NewID=1:nrow(x)))`
MW Frost
well for posterity I might as well put it in the answer ;)
JD Long