Importantly, do not use levels()
to reorder the levels. levels()
just gives you access to the labels of indices, and reordering the labels does not adjust the order of the levels.
> set.seed(20)
> x <- factor(sample(c("Men","Women"), 100, replace = T))
> table(x)
x
Men Women
57 43
> levels(x) <- c("Women","Men")
> table(x)
x
Women Men
57 43
All you've done here is rename the levels. In the original sample, there were 57 men, and then you renamed men "women." This hasn't changed the order of the levels. Making this mistake could really wreck all of your analysis!
To do that, use the relevel()
function. By default, it will move the level matching the character argument to the first level position. You can also pass it a vector of characters.
> set.seed(20)
> x <- factor(sample(c("Men","Women"), 100, replace = T))
> table(x)
x
Men Women
57 43
> x <- relevel(x, "Women")
> table(x)
x
Women Men
43 57
This has done the appropriate thing and changed the order of the levels, not just their name.
There is also a reorder()
function, which will properly reorder the levels of a factor according to their value on some other continuous variable.
> table(x)
x
Women Men
43 57
> set.seed(20)
> value <- rnorm(100)
> tapply(value, x, mean)
Women Men
0.1679080 -0.1180567
> x <- reorder(x, value, mean)
> table(x)
x
Men Women
57 43