ansaurus

Question

Classify or cut dataframe by list of class range and summarize it with ddply

Answer 1

+1 A:

You don't really need plyr for this, you can use reshape

## Pull what you need
dfx <- df[c("v_seed", "v_time","v_rank","v_perco")]
## Bring in your cuts
dfx <- data.frame(dfx, ifelse(df$v_perco > 10,"(10,20]", "[0,10)")))
## Rename v_cut
colnames(dfx)[ncol(dfx)] <- "v_cut"       
## Melt it.    
dfx <- melt(dfx, id=c("v_cut", "v_seed", "v_time"))
## Cast it.
dfx <- cast(dfx, v_cut + v_time + v_seed ~ variable, c(mean,min,max,sd))

if you only want the mean, then replace the last line with:

dfx <- cast(dfx, v_cut + v_time + v_seed ~ variable, mean)

type "dfx" and you'll see a data frame with what you asked for.

Brandon Bertelsen 2010-10-07 17:49:46

Thx for helping, i'm trying your solution but i have some problem with "cast" line, "bound" doesn't exist in df dataframe. do you have some good documentation for this function because ?cast or ?melt look cryptic :s

reyman64 2010-10-07 19:18:32

whoopsie, "bound" should be v_cut

Brandon Bertelsen 2010-10-07 19:29:55

I'm not sure what you want from v_cut, the cuts provided do not break it into bins of 10, but rather n=10, means 10 bins. I think what you want is cut_interval(x, length=10).

Brandon Bertelsen 2010-10-07 19:50:11

Yes, i correct the post :) thx for your answer, it's works!

reyman64 2010-10-07 19:55:19

Hum, it seems you have problem with ifelse function. I have [0,10] value in v_cut colum for v_perco > 10 and reverse.

reyman64 2010-10-07 20:17:52

Updated! Hope that works for ya

Brandon Bertelsen 2010-10-07 20:38:53

Yeah, it's work ! I update original post with an other question and other type of result...

reyman64 2010-10-08 10:19:16

Answer 2

+1 A:

You're just having a problem with syntax is all:

## Add your cut
df.new <- data.frame(df, ifelse(df$v_perco > 10,"(10,20]", "[0,10)"))
## Rename v_cut
colnames(df.new)[ncol(df.new)] <- "v_cut"   

## Careful here read the note below
df.new <- ddply(df.new, .(v_idn, v_time), function(x) unique(data.frame(
mean =  mean(x$v_rank),
v_cut = x$v_cut
)))

Alternatively:

ddply(df.new, .(v_idn, v_time), summarise, mean=mean(v_rank))

With ".(v_idn, v_time)" you're telling ddply that for each combination of v_idn and v_time, you want it to calculate the mean of v_rank.

Brandon Bertelsen 2010-10-08 15:43:24

ansaurus

tags:

views:

answers:

Classify or cut dataframe by list of class range and summarize it with ddply

related questions