views:

108

answers:

4
a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

r<-sapply(split(a.3,a.2),function(x) which.max(x$b.2))

a.3[r,]

returns the list index, not the index for the entire data.frame

Im trying to return the largest value of b.2 for each subgroup of a.2. How can I do this efficiently?

A: 
a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)
m<-split(a.3,a.2)
u<-function(x){
    a<-rownames(x)
    b<-which.max(x[,2])
    as.numeric(a[b])
    }
r<-sapply(m,FUN=function(x) u(x))

a.3[r,]

This does the trick, albeit somewhat cumbersome...But it allows me to grab the rows for the groupwise largest values. Any other ideas?

Misha
A: 
> a.2<-sample(1:10,100,replace=T)
> b.2<-sample(1:100,100,replace=T)
> tapply(b.2, a.2, max)
 1  2  3  4  5  6  7  8  9 10 
99 92 96 97 98 99 94 98 98 96 
Jonathan Chang
+1  A: 
a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

The answer by Jonathan Chang gets you what you explicitly asked for, but I'm guessing that you want the actual row from the data frame.

sel <- ave(b.2, a.2, FUN = max) == b.2
a.3[sel,]
John
That was much simpler I must admit.. However the logic behind the == b.2 is beyond me...
Misha
The ave generates a vector that just contains the max of b.2 for every a.2. Therefore, where it == b.2 that sets a truth value as long as the data frame has rows. You're using the logical vector to select rows in the data frame. If you want to see how it's working add the result of the ave command to your data frame and look at it, comparing to the b.2 column -- a.3$b.max <- ave(b.2, a.2, FUN = max) . Also, you could make the sel variable and look at it in context with -- a.3$sel <- a.3$b.2 == a.3$b.max
John
Thx...I appreciate your efforts..
Misha
A: 
library(plyr)
ddply(a.3, "a.2", subset, b.2 == max(b.2))
hadley
I tried using the ddply function but it is painfully slow. I didnt time it but it lasted a coffecup and a trip to the bathroom whilst the ave version used only .2s in my original dataset (210col*16000rows).
Misha
That'll be fixed in the next version. But you can't expect to get answers that will work with your data unless you supply a realistic example!
hadley