views:

99

answers:

1

I want to use Random forests for attribute reduction. One problem I have in my data is that I don't have discrete class - only continuous, which indicates how example differs from 'normal'. This class attribute is a kind of distance from zero to infinity. Is there any way to use Random forest for such data?

+2  A: 

That should be no problem -- RF will just switch to regression mode. Use randomForest function from the randomForest package.
To get object similarity with proximity=TRUE argument, like:

randomForest(Sepal.Length~.,data=iris,proximity=TRUE)$proximity

To get node-purity (Gini-index like) attribute importance:

randomForest(Sepal.Length~.,data=iris)$importance[,"IncNodePurity"]

To get mean MSE increase (accuracy-decrease like) attribute importance:

randomForest(Sepal.Length~.,data=iris,importance=TRUE)$importance[,"%IncMSE"]
mbq
Variable importance would work better for me than proximity. Maybe description of my classification attribute was a bit misleading, but I know what should I explore.
pixel
Aw, you meant variable==attribute...
mbq
I have extended the answer to cover that.
mbq
Ok, I've also changed the question to use 'data-mining' vocabulary. Shouldn't you use `importance=TRUE` also in the second statement?
pixel
No, because node-purity is calculated along with forest creation, so randomForest creates this element always, while MSE increase needs shuffling and classifying OOB objects.
mbq
You can carry out variable elimination from random forest, using the varSelRF package: http://bm2.genes.nig.ac.jp/RGM2/R_current/library/varSelRF/man/plot.varSelRF.html
gd047