ansaurus

Question

Using r and weka. How can I use meta-algorithms along with nfold evaluation method?

Answer 1

+2 A:

I guess you misinterprete the function of evaluate_Weka_classifier. In both cases, evaluate_Weka_classifier does only the cross-validation based on the training data. It doesn't change the model itself. Compare the confusion matrices of following code:

m<-J48(Species~., data=iris)
e<-evaluate_Weka_classifier(m,numFolds = 5)
summary(m)
e


m2 <- AdaBoostM1(Species ~. , data = iris ,
       control = Weka_control(W = list(J48, M = 30)))
e2 <- evaluate_Weka_classifier(m2,numFolds = 5)
summary(m2)
e2

In both cases, the summary gives you the evaluation based on the training data, while the function evaluate_Weka_classifier() gives you the correct crossvalidation. Neither for J48 nor for AdaBoostM1 the model itself gets updated based on the crossvalidation.

Now regarding the AdaBoost algorithm itself : In fact, it does use some kind of "weighted crossvalidation" to come to the final classifier. Wrongly classified items are given more weight in the next building step, but the evaluation is done using equal weight for all observations. So using crossvalidation to optimize the result doesn't really fit into the general idea behind the adaptive boosting algorithm.

If you want a true crossvalidation using a training set and a evaluation set, you could do the following :

id <- sample(1:length(iris$Species),length(iris$Species)*0.5)
m3 <- AdaBoostM1(Species ~. , data = iris[id,] ,
      control = Weka_control(W = list(J48, M=5)))

e3 <- evaluate_Weka_classifier(m3,numFolds = 5)
# true crossvalidation
e4 <- evaluate_Weka_classifier(m3,newdata=iris[-id,])

summary(m3)
e3
e4

If you want a model that gets updated based on a crossvalidation, you'll have to go to a different algorithm, eg randomForest() from the randomForest package. That collects a set of optimal trees based on crossvalidation. It can be used in combination with the RWeka package as well.

edit : corrected code for a true crossvalidation. Using the subset argument has effect in the evaluate_Weka_classifier() as well.

Joris Meys 2010-10-06 14:11:27

Thanks a lot for all your information, your reply is excelent, I really apreciate it.

mariana soffer 2010-10-24 22:47:33

@mariana: You're welcome. If the answer solved your problem, you can always indicate it as the accepted answer (the V sign on the left of the question). Cheers

Joris Meys 2010-10-25 10:46:07

ansaurus

tags:

views:

answers:

Using r and weka. How can I use meta-algorithms along with nfold evaluation method?

related questions