I want to use Random forests for attribute reduction. One problem I have in my data is that I don't have discrete class - only continuous, which indicates how example differs from 'normal'. This class attribute is a kind of distance from zero to infinity. Is there any way to use Random forest for such data?
views:
99answers:
1
+2
A:
That should be no problem -- RF will just switch to regression mode. Use randomForest
function from the randomForest
package.
To get object similarity with proximity=TRUE
argument, like:
randomForest(Sepal.Length~.,data=iris,proximity=TRUE)$proximity
To get node-purity (Gini-index like) attribute importance:
randomForest(Sepal.Length~.,data=iris)$importance[,"IncNodePurity"]
To get mean MSE increase (accuracy-decrease like) attribute importance:
randomForest(Sepal.Length~.,data=iris,importance=TRUE)$importance[,"%IncMSE"]
mbq
2010-07-07 20:35:02
Variable importance would work better for me than proximity. Maybe description of my classification attribute was a bit misleading, but I know what should I explore.
pixel
2010-07-07 21:07:45
Aw, you meant variable==attribute...
mbq
2010-07-07 21:12:27
I have extended the answer to cover that.
mbq
2010-07-07 21:20:39
Ok, I've also changed the question to use 'data-mining' vocabulary. Shouldn't you use `importance=TRUE` also in the second statement?
pixel
2010-07-07 21:35:49
No, because node-purity is calculated along with forest creation, so randomForest creates this element always, while MSE increase needs shuffling and classifying OOB objects.
mbq
2010-07-07 21:47:03
You can carry out variable elimination from random forest, using the varSelRF package: http://bm2.genes.nig.ac.jp/RGM2/R_current/library/varSelRF/man/plot.varSelRF.html
gd047
2010-07-08 09:09:37