tags:

views:

77

answers:

1

Is this even possible? I had a dataset for training that included about 1500 entries. The randomForest created its decision rules and applied them to the randomly chosen (from the original dataset) Out of Bag training sample (bootstrapped 10,000 times). I have a separate (unclassified) dataset that I would like to apply the 10,000 created trees to in order to predict classification for these new entries. Is there an easy way to index the underlying Forest trees to this new unclassified dataset?

+3  A: 

Have a look at Max Kuhn's caret which is designed to support exactly this: Classification and Regression Training as per its title.

It wraps around Random Forest as well as numerous other packages, and has ample documentation, including this JSS paper.

Besides caret, you can of course just use the predict method on the model you get returned as this example from the help page suggests:

 data(iris)
 set.seed(111)
 ind <- sample(2, nrow(iris), replace = TRUE, prob=c(0.8, 0.2))
 iris.rf <- randomForest(Species ~ ., data=iris[ind == 1,])
 iris.pred <- predict(iris.rf, iris[ind == 2,])
 table(observed = iris[ind==2, "Species"], predicted = iris.pred)

Instead of a random sample using ind, just subset the data yourself into training and validation sets.

Dirk Eddelbuettel