tags:

views:

445

answers:

4

Hello,

I have data frame with some numerical values and factors for groups, treatment etc. The order of levels for those factors is not the way I want them to be.

numbers = 1:4
letters = factor(c("a", "b", "c", "d"))
df <- data.frame(numbers, letters)
numbers letters
1       1       a
2       2       b
3       3       c
4       4       d

If I change the order of the levels, the letters no longer are with their corresponding numbers (my data is total nonsense from this point on).

levels(df$letters) <- c("d", "c", "b", "a")
  numbers letters
1       1       d
2       2       c
3       3       b
4       4       a

I simply want to change the order so when plotting (with ggplot2) the bar graphs are shown in the correct order (first the control, then the treatment etc.) There MUST be a quick way to change the order for such tasks, I run into this problem all the time I do something with R. :(

A: 

This should work:

df$letters = factor(letters[1:4], labels=c("d", "c", "b", "a"), ordered=T)
doug
I'm not sure I understand this one. I did it with `letters` `[1] a b c b a` `Levels: a b c` `df$letters <- factor(letters,labels=c("c", "b", "a"), ordered = T)`But now the numbers and the letters are not in order once again!Edit(s): Ok, I*m new here. How can I format text as code in a comment? I thought 4 leading spaces... or ``. Doesn't work.
crangos
+3  A: 

Use the levels argument of factor:

> df <- data.frame(f = 1:4, g = letters[1:4])
> df
  f g
1 1 a
2 2 b
3 3 c
4 4 d
> levels(df$g)
[1] "a" "b" "c" "d"
> df$g <- factor(df$g, levels = letters[4:1])
> levels(df$g)
[1] "d" "c" "b" "a"
> df
  f g
1 1 a
2 2 b
3 3 c
4 4 d
Jonathan Chang
Thank you, this worked. For some strange reason ggplot now correctly changed the order in the legend, but not in the plot. Weird.
crangos
ggplot2 required me to change both, the order of the levels (see above) and the order of the values of the data frame. df <- df[nrow(df):1, ] # reverse
crangos
+1  A: 

some more, just for the record

library(gdata)
df$letters <- reorder(df$letters, new.order=letters[4:1])

library(Hmisc)
df$letters <- reorder.factor(df$letters, letters[4:1])

You may also find useful Relevel and combine_factor.

gd047
A: 

Dealing with factors in R is quite peculiar job, I must admit... While reordering the factor levels, you're not reordering underlying numerical values. Here's a little demonstration:

> numbers = 1:4
> letters = factor(letters[1:4])
> dtf <- data.frame(numbers, letters)
> dtf
  numbers letters
1       1       a
2       2       b
3       3       c
4       4       d
> sapply(dtf, class)
  numbers   letters 
"integer"  "factor" 

Now, if you convert this factor to numeric, you'll get:

# return underlying numerical values
1> with(dtf, as.numeric(letters))
[1] 1 2 3 4
# change levels
1> levels(dtf$letters) <- letters[4:1]
1> dtf
  numbers letters
1       1       d
2       2       c
3       3       b
4       4       a
# return numerical values once again
1> with(dtf, as.numeric(letters))
[1] 1 2 3 4

As you can see... by changing levels, you change levels only (who would tell, eh?), not the numerical values! But, when you use factor function as @Jonathan Chang suggested, something different happens: you change numerical values themselves.

You're getting error once again 'cause you do levels and then try to relevel it with factor. Don't do it!!! Do not use levels or you'll mess things up (unless you know exactly what you're doing).

One lil' suggestion: avoid naming your objects with an identical name as R's objects (df is density function for F distribution, letters gives lowercase alphabet letters). In this particular case, your code would not be faulty, but sometimes it can be... but this can create confusion, and we don't want that, do we?!? =)

Instead, use something like this (I'll go from the beginning once again):

> dtf <- data.frame(f = 1:4, g = factor(letters[1:4]))
> dtf
  f g
1 1 a
2 2 b
3 3 c
4 4 d
> with(dtf, as.numeric(g))
[1] 1 2 3 4
> dtf$g <- factor(dtf$g, levels = letters[4:1])
> dtf
  f g
1 1 a
2 2 b
3 3 c
4 4 d
> with(dtf, as.numeric(g))
[1] 4 3 2 1

Note that you can also name you data.frame with df and letters instead of g, and the result will be OK. Actually, this code is identical with the one you posted, only the names are changed. This part factor(dtf$letter, levels = letters[4:1]) wouldn't throw an error, but it can be confounding!

Read the ?factor manual thoroughly! What's the difference between factor(g, levels = letters[4:1]) and factor(g, labels = letters[4:1])? What's similar in levels(g) <- letters[4:1] and g <- factor(g, labels = letters[4:1])?

You can put ggplot syntax, so we can help you more on this one!

Cheers!!!

Edit:

ggplot2 actually requires to change both levels and values? Hm... I'll dig this one out...

aL3xa