tags:

views:

51

answers:

2

I have a series of value that include the empty string

levels(mydata$phone_partner_products) "" "dont_know" "maybe_interesting"
"not_interesting" "very_interesting" "very_not_interesting"

If I make a frequencies table I get this table(mydata$phone_partner_products)

                            dont_know    maybe_interesting 
            3752                  226                 2907 
 not_interesting     very_interesting very_not_interesting 
            1404                 1653                 1065

How can I reorder the columns in a more meaningful way? How can I rename the empty string "" level?

Thank you a lot in advance.

+4  A: 

Use levels() to reset the factor level you want to change:

> ff <- as.factor(sample(c("foo", "", "bar"),20,repl=TRUE))
> table(ff)
ff
    bar foo 
  6   8   6 
> levels(ff)
[1] ""    "bar" "foo"
> levels(ff)[1] <- "ooops"
> table(ff)
ff
ooops   bar   foo 
    6     8     6 
> 
Dirk Eddelbuettel
Lovely solution. Thank you!
Libo Cannici
How about safe way: `levels(ff)[levels(ff)==""] <- "ooops"`?
Marek
Nice, then I can programmatley rename the the empty level if there!
Libo Cannici
+1  A: 

Importantly, do not use levels() to reorder the levels. levels() just gives you access to the labels of indices, and reordering the labels does not adjust the order of the levels.

> set.seed(20)
> x <- factor(sample(c("Men","Women"), 100, replace = T))
> table(x)
x
  Men Women 
   57    43 
> levels(x) <- c("Women","Men")
> table(x)
x
Women   Men 
   57    43 

All you've done here is rename the levels. In the original sample, there were 57 men, and then you renamed men "women." This hasn't changed the order of the levels. Making this mistake could really wreck all of your analysis!

To do that, use the relevel() function. By default, it will move the level matching the character argument to the first level position. You can also pass it a vector of characters.

> set.seed(20)
> x <- factor(sample(c("Men","Women"), 100, replace = T))
> table(x)
x
  Men Women 
   57    43
> x <- relevel(x, "Women")
> table(x)
x 
Women   Men 
   43    57

This has done the appropriate thing and changed the order of the levels, not just their name.

There is also a reorder() function, which will properly reorder the levels of a factor according to their value on some other continuous variable.

> table(x)
x
Women   Men 
   43    57 
> set.seed(20)
> value <- rnorm(100)
> tapply(value, x, mean)
     Women        Men 
 0.1679080 -0.1180567 
> x <- reorder(x, value, mean)
> table(x)
x
  Men Women 
   57    43 
JoFrhwld
Thank you for your exhaustive explanation!
Libo Cannici