ansaurus

Question

Collapsing data frame by selecing one row per group

Answer 1

+2 A:

Maybe duplicated() can help:

R> d[ !duplicated(d$x), ]
  x  y  z
1 1 10 20
3 2 12 18
4 4 13 17
R>

Edit Shucks, never mind. This picks the first in each block of repetitions, you wanted the last. So here is another attempt using plyr:

R> ddply(d, "x", function(z) tail(z,1))
  x  y  z
1 1 11 19
2 2 12 18
3 4 13 17
R>

Here plyr does the hard work of finding unique subsets, looping over them and applying the supplied function -- which simply returns the last set of observations in a block z using tail(z, 1).

Dirk Eddelbuettel 2010-04-13 02:19:35

I'd prefer all the columns, thanks

jkebinger 2010-04-13 02:20:11

So then you need to simply add a 'processing step' to create a factor variable over which plyr can loop. It can all be done with indexing commands, give it a try. And by the way, you are inconsistent between your text (saying first row selected) and example (showing second row).

Dirk Eddelbuettel 2010-04-13 02:51:49

By the way, cross-posting between r-help and here is also somewhat poor style. You got good answers at r-help, so why don't you study them?

Dirk Eddelbuettel 2010-04-13 02:59:15

Sorry about the cross posting, and thanks for the solutions

jkebinger 2010-04-13 12:56:04

My pleasure. As a matter of common best practices here on StackOverflow, you should accept one post as the solutions (if you feel it provides one) and vote each helpful post up by clicking on the up arrow. That is how the scoring works here.

Dirk Eddelbuettel 2010-04-13 13:36:57

Answer 2

+2 A:

Just to add a little to what Dirk provided... duplicated has a fromLast argument that you can use to select the last row:

d[ !duplicated(d$x,fromLast=TRUE), ]

Ian Fellows 2010-04-13 06:00:54

Hi Ian -- unfortunately James never really made a clear case as to whether he wanted first or last and contradicts himself in the post ... but your hint about fromLast is a good one!

Dirk Eddelbuettel 2010-04-13 12:14:43

thanks, that works like a charm. Whether its first or last I needed was really up to the ordering, and with fromLast I can attack it either way

jkebinger 2010-04-13 12:54:32

I suggested the same thing and you shot it down on on the grounds of 'prefer all columns'. How come that no longer matters?

Dirk Eddelbuettel 2010-04-13 13:35:24

Sorry, Dirk, I misunderstood how duplicated works at the time

jkebinger 2010-04-15 15:17:51

ansaurus

tags:

views:

answers:

Collapsing data frame by selecing one row per group

related questions