tags:

views:

89

answers:

4

What is the quickest/best way to change a large number of columns to numeric from factor?

I used the following code but it appears to have re-ordered my data.

> head(stats[,1:2])
  rk                 team
1  1 Washington Capitals*
2  2     San Jose Sharks*
3  3  Chicago Blackhawks*
4  4     Phoenix Coyotes*
5  5   New Jersey Devils*
6  6   Vancouver Canucks*

for(i in c(1,3:ncol(stats))) {
    stats[,i] <- as.numeric(stats[,i])
}

> head(stats[,1:2])
  rk                 team
1  2 Washington Capitals*
2 13     San Jose Sharks*
3 24  Chicago Blackhawks*
4 26     Phoenix Coyotes*
5 27   New Jersey Devils*
6 28   Vancouver Canucks*

What is the best way, short of naming every column as in:

df$colname <- as.numeric(ds$colname)
+1  A: 

you have to be careful while changing factors to numeric. here is a line of code that would change a set of columns from factor to numeric. i am assuming here that the columns to be changed to numeric are 1, 3, 4 and 5 respectively and you could change it accordingly

cols = c(1, 3, 4, 5);    
df[,cols] = apply(df[,cols], 2, function(x) as.numeric(as.character(x));
Ramnath
This won't work correctly. Example: `x<-as.factor(1:3); df<-data.frame(a=x,y=runif(3),b=x,c=x,d=x)`. I don't think that `apply` is appropriate to this kind of problems.
Marek
apply works perfectly in these situations. the error in my code was using margin = 1, instead of 2 as the function needs to be applied column wise. i have edited my answer accordingly.
Ramnath
@Ramnath Now it works. But I think it could be done without `apply`. Check my edit.
Marek
... or Joris answer with `unlist`. And `as.character` conversion in your solution is not needed cause `apply` converts `df[,cols]` to `character` so `apply(df[,cols], 2, function(x) as.numeric(x))` will work too.
Marek
+3  A: 

Further to Ramnath's answer, the behaviour you are experiencing is that due to as.numeric(x) returning the internal, numeric representation of the factor x at the R level. If you want to preserve the numbers that are the levels of the factor (rather than their internal representation), you need to convert to character via as.character() first as per Ramnath's example.

Your for loop is just as reasonable as an apply call and might be slightly more readable as to what the intention of the code is. Just change this line:

stats[,i] <- as.numeric(stats[,i])

to read

stats[,i] <- as.numeric(as.character(stats[,i]))

This is FAQ 7.10 in the R FAQ.

HTH

Gavin Simpson
No need for any kind of loop. Just use the indices and unlist(). Edit : I added an answer illustrating this.
Joris Meys
+1  A: 

I think that ucfagls found why your loop is not working.

In case you still don't want use a loop here is solution with lapply:

factorToNumeric <- function(f) as.numeric(levels(f))[as.integer(f)] 
cols <- c(1, 3:ncol(stats))
stats[cols] <- lapply(stats[cols], factorToNumeric)

Edit. I found simpler solution. It seems that as.matrix convert to character. So

stats[cols] <- as.numeric(as.matrix(stats[cols]))

should do what you want.

Marek
+1  A: 

This can be done in one line, there's no need for a loop, be it a for-loop or an apply. Use unlist() instead :

# testdata
Df <- data.frame(
  x = as.factor(sample(1:5,30,r=T)),
  y = as.factor(sample(1:5,30,r=T)),
  z = as.factor(sample(1:5,30,r=T)),
  w = as.factor(sample(1:5,30,r=T))
)
##

Df[,c("y","w")] <- as.numeric(as.character(unlist(Df[,c("y","w")])))

str(Df)

Edit : for your code, this becomes :

id <- c(1,3:ncol(stats))) 
stats[,id] <- as.numeric(as.character(unlist(stats[,id])))
Joris Meys
Small improvement could be setting `recursive` and `use.names` parameters of `unlist` both to `FALSE`.
Marek
@Marek : true. I love this game :-)
Joris Meys