tags:

views:

78

answers:

4

Let's say I have a data frame like this:

df <- data.frame(a=letters[1:26],1:26)

And I would like to "re" factor a, b, and c as "a".

How do I do that?

+3  A: 

One option is the recode() function in package car:

require(car)
df <- data.frame(a=letters[1:26],1:26)
df2 <- within(df, a <- recode(a, 'c("a","b","c")="a"'))
> head(df2)
  a X1.26
1 a     1
2 a     2
3 a     3
4 d     4
5 e     5
6 f     6

Example where a is not so simple and we recode several levels into one.

set.seed(123)
df3 <- data.frame(a = sample(letters[1:5], 100, replace = TRUE),
                  b = 1:100)
with(df3, head(a))
with(df3, table(a))

the last lines giving:

> with(df3, head(a))
[1] b d c e e a
Levels: a b c d e
> with(df3, table(a))
a
 a  b  c  d  e 
19 20 21 22 18

Now lets combine levels a and e into level Z using recode()

df4 <- within(df3, a <- recode(a, 'c("a","e")="Z"'))
with(df4, head(a))
with(df4, table(a))

which gives:

> with(df4, head(a))
[1] b d c Z Z Z
Levels: b c d Z
> with(df4, table(a))
a
 b  c  d  Z 
20 21 22 37

Doing this without spelling out the levels to merge:

## Select the levels you want (here 'a' and 'e')
lev.want <- with(df3, levels(a)[c(1,5)])
## now paste together
lev.want <- paste(lev.want, collapse = "','")
## then bolt on the extra bit
codes <- paste("c('", lev.want, "')='Z'", sep = "")
## then use within recode()
df5 <- within(df3, a <- recode(a, codes))
with(df5, table(a))

Which gives us the same as df4 above:

> with(df5, table(a))
a
 b  c  d  Z 
20 21 22 37 
Gavin Simpson
Does it have to be an ordered factor to be able to use this?
Brandon Bertelsen
@Brandon: no, and in the above example it wasn't ordered from the point of view of `with(df, is.ordered(a))`. I'll add another example showing something where `a` isn't as simple as your original.
Gavin Simpson
I meant in the right order, not ordered. You're right.
Brandon Bertelsen
@Brandon: Ok, great. So the extra example I added to my answer shows `recode()` working in such an un-ordered factor.
Gavin Simpson
Is there a way to reference the levels without typing out the label?
Brandon Bertelsen
Could you expand on what you mean? At some point you're going to have to state which levels you want to merge. So I probably don't quite get what you mean.
Gavin Simpson
Sorry, was being dense. I've edited my second example to show what I think you are after.
Gavin Simpson
I wasn't exactly clear either! Thanks for the detailed answer!
Brandon Bertelsen
+2  A: 

You could do something like:

df$a[df$a %in% c("a","b","c")] <- "a"

UPDATE: More complicated factors.

Data <- data.frame(a=sample(c("Less than $50,000","$50,000-$99,999",
  "$100,000-$249,999", "$250,000-$500,000"),20,TRUE),n=1:20)
rows <- Data$a %in% c("$50,000-$99,999", "$100,000-$249,999")
Data$a[rows] <- "$250,000-$500,000"
Joshua Ulrich
+1  A: 

there are two ways. if you don't want to drop the unused levels, ie "b" and "c", Joshua's solution is probably best.

if you want to drop the unused levels, then

df$a<-factor(ifelse(df$a%in%c("a","b","c"),"a",as.character(df$a)))

or

levels(df$a)<-ifelse(levels(df$a)%in%c("a","b","c"),"a",levels(df$a))
kohske