ansaurus

Question

select rows with largest value of variable within a group in r

Answer 1

A:

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
m<-split(a.3,a.2)
u<-function(x){
    a<-rownames(x)
    b<-which.max(x[,2])
    as.numeric(a[b])
    }
r<-sapply(m,FUN=function(x) u(x))

a.3[r,]

This does the trick, albeit somewhat cumbersome...But it allows me to grab the rows for the groupwise largest values. Any other ideas?

Misha 2010-05-12 22:06:02

Answer 2

A:

> a.2<-sample(1:10,100,replace=T)
> b.2<-sample(1:100,100,replace=T)
> tapply(b.2, a.2, max)
 1  2  3  4  5  6  7  8  9 10 
99 92 96 97 98 99 94 98 98 96

Jonathan Chang 2010-05-12 23:09:11

Answer 3

+1 A:

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

The answer by Jonathan Chang gets you what you explicitly asked for, but I'm guessing that you want the actual row from the data frame.

sel <- ave(b.2, a.2, FUN = max) == b.2
a.3[sel,]

John 2010-05-12 23:35:41

That was much simpler I must admit.. However the logic behind the == b.2 is beyond me...

Misha 2010-05-12 23:59:51

The ave generates a vector that just contains the max of b.2 for every a.2. Therefore, where it == b.2 that sets a truth value as long as the data frame has rows. You're using the logical vector to select rows in the data frame. If you want to see how it's working add the result of the ave command to your data frame and look at it, comparing to the b.2 column -- a.3$b.max <- ave(b.2, a.2, FUN = max) . Also, you could make the sel variable and look at it in context with -- a.3$sel <- a.3$b.2 == a.3$b.max

John 2010-05-13 02:05:06

Thx...I appreciate your efforts..

Misha 2010-05-13 07:57:12

Answer 4

A:

library(plyr)
ddply(a.3, "a.2", subset, b.2 == max(b.2))

hadley 2010-05-13 12:54:08

I tried using the ddply function but it is painfully slow. I didnt time it but it lasted a coffecup and a trip to the bathroom whilst the ave version used only .2s in my original dataset (210col*16000rows).

Misha 2010-05-13 22:52:09

That'll be fixed in the next version. But you can't expect to get answers that will work with your data unless you supply a realistic example!

hadley 2010-05-14 03:04:21

ansaurus

tags:

views:

answers:

select rows with largest value of variable within a group in r

related questions