ansaurus

Question

How to purposely overfit Weka tree classifiers?

Answer 1

+2 A:

The quick and dirty solution is to resample. Throw away all but 1500 of your positive examples and train on a balanced data set. I am pretty sure there is a resample component in Weka to do this.

The other solution is to use a classifier with a variable cost for each class. I'm pretty sure libSVM allows you to do this and I know Weka can wrap libSVM. However I haven't used Weka in a while so I can't be of much practical help here.

StompChicken 2010-07-11 16:53:35

Thanks. I'm not sure resampling would work - from experiments I made it seems that even on a rather balanced dataset (1000 examples for each class) J48 and other classifiers (except SimpleCart) get ridiculous results - either very high FP or FN for class "0" or very high for class "1" (and the other class is classified mostly correctly). Regarding the cost sensitive classification - I totally forgot about it, I'll look into it soon. Thank you!

Haggai 2010-07-11 17:07:50

The cost sensitive approach worked. I still don't understand why unpruned J48 won't give me 100% accuracy on the training set, or why a rather balanced dataset with J48 still gave ridiculous outputs. But at least now I have something to work with. Thanks!

Haggai 2010-07-12 14:12:18

Answer 2

+2 A:

Weka contains metaclassifiers: weka.classifiers.meta.CostSensitiveClassifier and weka.classifiers.meta.MetaCost. They allows you to make any algorithm cost-sensitive (not restricted to SVM) and to specify a cost matrix (penalty of the various errors); you would give a higher penalty for misclassifying 1 instances as 0 than you would give for erroneously classifying 0 as 1.

The result is that the algorithm would then try to: "minimize expected misclassification cost (rather than the most likely class)"

Amro 2010-07-15 19:57:05

Thanks, that's exactly the solution I've used.

Haggai 2010-07-16 06:32:18

ansaurus

tags:

views:

answers:

How to purposely overfit Weka tree classifiers?

related questions