ansaurus

Question

Answer 1

+3 A:

Just add a new column each time plyr calls you:

R> DF <- data.frame(kmer=sample(1:3, 50, replace=TRUE), \
                    cvCut=sample(LETTERS[1:3], 50, replace=TRUE))
R> library(plyr)
R> ddply(DF, .(kmer, cvCut), function(X) data.frame(X, newId=1:nrow(X)))
   kmer cvCut newId
1     1     A     1
2     1     A     2
3     1     A     3
4     1     A     4
5     1     A     5
6     1     A     6
7     1     A     7
8     1     A     8
9     1     A     9
10    1     A    10
11    1     A    11
12    1     B     1
13    1     B     2
14    1     B     3
15    1     B     4
16    1     B     5
17    1     B     6
18    1     C     1
19    1     C     2
20    1     C     3
21    2     A     1
22    2     A     2
23    2     A     3
24    2     A     4
25    2     A     5
26    2     B     1
27    2     B     2
28    2     B     3
29    2     B     4
30    2     B     5
31    2     B     6
32    2     B     7
33    2     C     1
34    2     C     2
35    2     C     3
36    2     C     4
37    3     A     1
38    3     A     2
39    3     A     3
40    3     A     4
41    3     B     1
42    3     B     2
43    3     B     3
44    3     B     4
45    3     C     1
46    3     C     2
47    3     C     3
48    3     C     4
49    3     C     5
50    3     C     6
R>

Dirk Eddelbuettel 2010-02-02 20:36:33

+1 You beat me by 17 seconds! Argh!

Shane 2010-02-02 20:37:45

It's a virtual tie especially as our solutions are so alike. I had yours at first but didn't like the name of the added column to back to the shed for `data.frame()` in lieu of `cbind()` ;-)

Dirk Eddelbuettel 2010-02-02 20:40:12

looks good - thanks

jermdemo 2010-02-02 20:44:02

Answer 2

+1 A:

I think that this is what you want:

Load the data:

x <- read.table(textConnection(
"id      size kmer cvCut   cumsum
1      8132   23    10     8132
10000   778   23    10 13789274
30000   324   23    10 23658740
50000   182   23    10 28534840
100000   65   23    10 33943283
200000   25   23    10 37954383
250000  584   23    12 16546507
300000  110   23    12 29435303
400000   28   23    12 34697860
600000  127   23     2 47124443
600001  127   23     2 47124570"), header=TRUE)

Use ddply:

library(plyr)
ddply(x, .(kmer, cvCut), function(x) cbind(x, 1:nrow(x)))

Shane 2010-02-02 20:36:51

Well with `data.frame()` we get to set the desired column label `newId` as well :)

Dirk Eddelbuettel 2010-02-02 20:38:04

Very true. But I *did* use the supplied data. :)

Shane 2010-02-02 20:39:54

Answer 3

+7 A:

I'd do it like this:

library(plyr)
ddply(df, c("kmer", "cvCut"), transform, newID = seq_along(kmer))

hadley 2010-02-02 21:16:22

ansaurus

tags:

views:

answers:

How do I use plyr to number rows?

related questions