views:

206

answers:

2

When I convert a factor to a numeric, the values change to rank values.

R> m$obs
 [1] 0  0  1  1  1  1  3  3  3  3  3  3  3  9  9  9  9  9  9  9  9  9  11 11 12 13 13 13 13 13 
 13 13 14
Levels: 0 1 3 9 11 12 13 14

R> as.numeric(m$obs)
 [1] 1 1 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 5 6 7 7 7 7 7 7 7 8

I have to resort to paste() to get the real values.

R> paste(m$obs)
 [1] "0"  "0"  "1"  "1"  "1"  "1"  "3"  "3"  "3"  "3"  "3"  "3"  "3"  "9"  "9"  "9"  "9" "9"
 "9"  "9"  "9"  "9"  "11" "11" "12" "13" "13" "13" "13" "13" "13" "13" "14"
R> as.numeric(paste(m$obs))
 [1]  0  0  1  1  1  1  3  3  3  3  3  3  3  9  9  9  9  9  9  9  9  9 11 11 12 13 13 13 13 13 
 13 13 14

Is there a simpler way to convert a factor to numeric? Thanks!

+8  A: 

See the Warning section of ?factor:

In particular, ‘as.numeric’ applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor ‘f’ to approximately its original numeric values, ‘as.numeric(levels(f))[f]’ is recommended and slightly more efficient than ‘as.numeric(as.character(f))’.

Joshua Ulrich
A: 

I disagree with the technique recited in the accepted answer.

An explicit conversion step is unnecessary here, and therefore the risk of losing any information can be avoided entirely.

R's internal representation for factors is an integer array (plus another array to map the names to those integers).

Therefore, you can use the unclass() function to access that internal representation, which of course means that your 'unclassed' variable will now be of class 'integer' (which you can easily convert to 'numeric', by passing it to 'as.numeric()')

E.g.,

my_factor_as_numeric = unclass(myDataFrame$aFactor)

The OP's Question is not how to convert a factor, but how to do so with maximum fidelity--and in this instance the best way to do that is not to 'convert' it at all but to access its internal representation, particularly since that internal representation happens to be of the same data type as the intended conversion target.

The accepted answer requires A => B => C (A = R's internal data type; B = interface (character representation); C = numeric representation), but 'A' and 'C' are the same. It is always better to avoid an explicit conversion step and instead access the underlying data representation

doug
But the question was how to convert the levels to numeric, not how to access the internal values.
Joshua Ulrich
the example data looks like this: `x <- factor(sort(rep(0:13, 3)))`. `unclass(x)` produces integers from 1 to 11, which is not the same as the original data. That is, A != C. If `y <- unclass(x)`, then to get the original data, you need to do `as.numeric(attr(y,"levels")[y])`, which is basically given in the accepted answer as `as.numeric(levels(x)[x])`
JoFrhwld