tags:

views:

86

answers:

2

I have a large data.frame, and I'd like to be able to reduce it by using a quantile subset by one of the variables. For example:

x <- c(1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10,1:10)
df <- data.frame(x,rnorm(100))

df2 <- subset(df, df$x == 1)
df3 <- subset(df2, df2[2] > quantile(df2$rnorm.100.,0.8))

What I would like to end up with is a data.frame that contains all quantiles for x=1,2,3...10.

Is there a way to do this with ddply?

+3  A: 

You could try:

ddply(df, .(x), subset, rnorm.100. > quantile(rnorm.100., 0.8))

And off topic: you could use df <- data.frame(x,y=rnorm(100)) to name a column on-the-fly.

Marek
Thanks Marek, for the answer and the tip about specifying a colname on the fly - although not requested, it was something I was wondering how to do!
Brandon Bertelsen
+2  A: 

Here's a different approach with the little used ave() command. (very fast to calculate this way)

Make a new column that contains the quantile calculation across each level of x

df$quantByX <-  ave(df$rnorm.100., df$x, FUN = function (x) quantile(x,0.8))

Select the items of the new column and the x column.

df2 <- unique(df[,c(1,3)])

The result is one data frame with the unique items in the x column and the calculated quantile for each level of x.

John
`ave` is one of most powerful R functions. But in this case I think you should use it in this way: `subset(df, rnorm.100. > ave(rnorm.100., x, FUN=function(v) quantile(v, 0.8)))`
Marek
that clarifies the question for me... :)
John
I've not had the opportunity to try this function before. Marek's solution above works well for my purposes. But thank you for this as well, I'll look into "ave".
Brandon Bertelsen