ansaurus

Question

Answer 1

A:

Below is some sample output for a naive Bayes classifier, using 10-fold cross-validation. There's a lot of information there, and what you should focus on depends on your application. I'll explain some of the results below, to get you started.

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances          71               71      %
Incorrectly Classified Instances        29               29      %
Kappa statistic                          0.3108
Mean absolute error                      0.3333
Root mean squared error                  0.4662
Relative absolute error                 69.9453 %
Root relative squared error             95.5466 %
Total Number of Instances              100     

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.967     0.692      0.686     0.967     0.803      0.709    0
                 0.308     0.033      0.857     0.308     0.453      0.708    1
Weighted Avg.    0.71      0.435      0.753     0.71      0.666      0.709

=== Confusion Matrix ===

  a  b   <-- classified as
 59  2 |  a = 0
 27 12 |  b = 1

The correctly and incorrectly classified instances show the percentage of test instances that were correctly and instances classified. The raw numbers are shown in the confusion matrix, with a and b representing the class labels. Here there were 100 instances, so the percentages and raw numbers add up, aa + bb = 59 + 12 = 71, ab + ba = 27 + 2 = 29.

The percentage of correctly classified instances is often called accuracy or sample accuracy. It has some disadvantages as a performance estimate (not chance corrected, not sensitive to class distribution), so you'll probably want to look at some of the other numbers. ROC Area, or area under the ROC curve, is my preferred measure.

Kappa is a chance-corrected measure of agreement between the classifications and the true classes. It's calculated by taking the agreement expected by chance away from the observed agreement and dividing by the maximum possible agreement. A value greater than 0 means that your classifier is doing better than chance (it really should be!).

The error rates are used for numeric prediction rather than classification. In numeric prediction, predictions aren't just right or wrong, the error has a magnitude, and these measures reflect that.

Hopefully that will get you started.

michaeltwofish 2010-08-16 00:33:11

ansaurus

tags:

views:

answers:

How to interpret weka classification?

related questions