ansaurus

Question

How to use R Random forests to reduce attributes having no discrete classes?

Answer 1

+2 A:

That should be no problem -- RF will just switch to regression mode. Use randomForest function from the randomForest package.
To get object similarity with proximity=TRUE argument, like:

randomForest(Sepal.Length~.,data=iris,proximity=TRUE)$proximity

To get node-purity (Gini-index like) attribute importance:

randomForest(Sepal.Length~.,data=iris)$importance[,"IncNodePurity"]

To get mean MSE increase (accuracy-decrease like) attribute importance:

randomForest(Sepal.Length~.,data=iris,importance=TRUE)$importance[,"%IncMSE"]

mbq 2010-07-07 20:35:02

Variable importance would work better for me than proximity. Maybe description of my classification attribute was a bit misleading, but I know what should I explore.

pixel 2010-07-07 21:07:45

Aw, you meant variable==attribute...

mbq 2010-07-07 21:12:27

I have extended the answer to cover that.

mbq 2010-07-07 21:20:39

Ok, I've also changed the question to use 'data-mining' vocabulary. Shouldn't you use `importance=TRUE` also in the second statement?

pixel 2010-07-07 21:35:49

No, because node-purity is calculated along with forest creation, so randomForest creates this element always, while MSE increase needs shuffling and classifying OOB objects.

mbq 2010-07-07 21:47:03

You can carry out variable elimination from random forest, using the varSelRF package: http://bm2.genes.nig.ac.jp/RGM2/R_current/library/varSelRF/man/plot.varSelRF.html

gd047 2010-07-08 09:09:37

ansaurus

tags:

views:

answers:

How to use R Random forests to reduce attributes having no discrete classes?

related questions