ansaurus

Question

How to do median splits within factor levels in R?

Answer 1

+1 A:

Here is a hack-ish way. Hadley may come with something more elegant:

To start, we simple concatenate the by output:

 R> do.call(c,byOutput)
A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 
 1  2  2  1  1  1  1  2  1  2  1  2  1  1  2

and what matters that we get the factor levels 1 and 2 here which we can use to re-index a new factor with those levels:

R> c("Below","Above")[do.call(c,byOutput)]
 [1] "Below" "Above" "Above" "Below" "Below" "Below" "Below" "Above" 
 [8] "Below" "Above" "Below" "Above" "Below" "Below" "Above"
R> as.factor(c("Below","Above")[do.call(c,byOutput)])
[1] Below Above Above Below Below Below Below Above Below Above 
[11] Below Above Below Below Above
Levels: Above Below

which we can then assign into the data.frame you wanted to modify:

R> myDataFrame$FactorLevelMedianSplit <- 
      as.factor(c("Below","Above")[do.call(c,byOutput)])

Update: Never mind, we'd need to reindex myDataFrame to be sorted A A ... A B ... B C ... C as well before we add the new column. Left as an exercise...

Dirk Eddelbuettel 2009-08-11 12:37:03

Answer 2

+2 A:

Here is a solution using the plyr package.

myDataFrame <- data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5))
library(plyr)
ddply(myDataFrame, "myFactor", function(x){
    x$Median <- median(x$myData)
    x$FactorLevelMedianSplit <- factor(x$myData <= x$Median, levels = c(TRUE, FALSE), labels = c("Below", "Above"))
    x
})

Thierry 2009-08-11 14:22:44

This worked great. See also the update to the post for a packageless way.

Dan Goldstein 2009-08-12 12:06:28

ansaurus

tags:

views:

answers:

How to do median splits within factor levels in R?

related questions