views:

37

answers:

1

Hello there,

I've just used Weka to train my SVM classifier under "Classify" tag. Now I want to further investigate which data samples are mis-classified,I need to study their pattern,but I don't know where to look at this from Weka. Could anyone give me some help please? Thanks in advance.

+1  A: 

You can enable the option from:

alt text

You will get the following instance predictions:

=== Predictions on test split ===

 inst#     actual   predicted  error prediction
   1   2:Iris-ver  2:Iris-ver         0.667 
  ...
  16   3:Iris-vir  2:Iris-ver   +     0.667 

EDIT

As I explained in the comments, you can use the StratifiedRemoveFolds filter to manually split the data and create the 10-folds of the cross-validation.

This Primer from the Weka wiki has some examples of how to invoke Weka from the command line. Here's a sample bash script:

#!/bin/bash

# I assume weka.jar is on the CLASSPATH

# 10-folds CV
for f in $(seq 1 10); do
    echo -n "."

    # create train/test set for fold=f
    java weka.filters.supervised.instance.StratifiedRemoveFolds -i iris.arff \
        -o iris-f$f-train.arff -c last -N 10 -F $f -V
    java weka.filters.supervised.instance.StratifiedRemoveFolds -i iris.arff \
        -o iris-f$f-test.arff -c last -N 10 -F $f

    # classify using SVM and store predictions of test set
    java weka.classifiers.functions.SMO -C 1.0 \
        -K "weka.classifiers.functions.supportVector.RBFKernel -G 0.01" \
        -t iris-f$f-train.arff -T iris-f$f-test.arff \
        -p 0 > f$f-pred.txt
        #-i > f$f-perf.txt
done
echo

For each fold, this will create two datasets (train/test) and store the predictions in a text file as well. That way you can match each index with the actual instance in the test set.

Of course the same can be done in the GUI if you prefer (only a bit more tedious!)

Amro
Thanks,Amro,that really helps a lot.One more question,if I want to investigate further,I mean,I want to look at the original sample of this misclassified one to do some pattern analysis,how could I do that?In other words,how could I locate the corresonding entry in my original .arff data file?Thanks1
Robert
I guess it depend on the test procedure you're using. If you consider the same dataset for both training/testing, then the inst# will be in the same order as the file. Otherwise, you can manually separate the dataset using the **StratifiedRemoveFolds** filter (say 2/3 for train and 1/3 for test, or even cross-validation folds), and supply the new file as a test set, that way the order of instances is preserved. An alternative way is to use the **AddClassification** filter which will add a new column to the dataset containing the prediction using your algorithm of choice.
Amro
... Those filters can be applied in the "Preprocess" tab
Amro
Thanks Amro.I used 10 fold cross-validation for the whole set,doing both training/testing,so could I assume that the second record in the first 10-item-group of the result corresponds to the second one in the original data file while the 2nd item of the second 10-item-group of the result corresponds to the 12th of the original data file?
Robert
@Robert: I added a sample bash script to perform the task
Amro