ansaurus

Question

beginner question on investigating on samples in Weka

Answer 1

+1 A:

You can enable the option from:

alt text

You will get the following instance predictions:

=== Predictions on test split ===

 inst#     actual   predicted  error prediction
   1   2:Iris-ver  2:Iris-ver         0.667 
  ...
  16   3:Iris-vir  2:Iris-ver   +     0.667

EDIT

As I explained in the comments, you can use the StratifiedRemoveFolds filter to manually split the data and create the 10-folds of the cross-validation.

This Primer from the Weka wiki has some examples of how to invoke Weka from the command line. Here's a sample bash script:

#!/bin/bash

# I assume weka.jar is on the CLASSPATH

# 10-folds CV
for f in $(seq 1 10); do
    echo -n "."

    # create train/test set for fold=f
    java weka.filters.supervised.instance.StratifiedRemoveFolds -i iris.arff \
        -o iris-f$f-train.arff -c last -N 10 -F $f -V
    java weka.filters.supervised.instance.StratifiedRemoveFolds -i iris.arff \
        -o iris-f$f-test.arff -c last -N 10 -F $f

    # classify using SVM and store predictions of test set
    java weka.classifiers.functions.SMO -C 1.0 \
        -K "weka.classifiers.functions.supportVector.RBFKernel -G 0.01" \
        -t iris-f$f-train.arff -T iris-f$f-test.arff \
        -p 0 > f$f-pred.txt
        #-i > f$f-perf.txt
done
echo

For each fold, this will create two datasets (train/test) and store the predictions in a text file as well. That way you can match each index with the actual instance in the test set.

Of course the same can be done in the GUI if you prefer (only a bit more tedious!)

Amro 2010-09-16 02:35:33

Thanks,Amro,that really helps a lot.One more question,if I want to investigate further,I mean,I want to look at the original sample of this misclassified one to do some pattern analysis,how could I do that?In other words,how could I locate the corresonding entry in my original .arff data file?Thanks1

Robert 2010-09-16 03:03:54

I guess it depend on the test procedure you're using. If you consider the same dataset for both training/testing, then the inst# will be in the same order as the file. Otherwise, you can manually separate the dataset using the **StratifiedRemoveFolds** filter (say 2/3 for train and 1/3 for test, or even cross-validation folds), and supply the new file as a test set, that way the order of instances is preserved. An alternative way is to use the **AddClassification** filter which will add a new column to the dataset containing the prediction using your algorithm of choice.

Amro 2010-09-16 03:32:07

... Those filters can be applied in the "Preprocess" tab

Amro 2010-09-16 03:33:32

Thanks Amro.I used 10 fold cross-validation for the whole set,doing both training/testing,so could I assume that the second record in the first 10-item-group of the result corresponds to the second one in the original data file while the 2nd item of the second 10-item-group of the result corresponds to the 12th of the original data file?

Robert 2010-09-16 03:38:50

@Robert: I added a sample bash script to perform the task

Amro 2010-09-16 05:15:30

ansaurus

tags:

views:

answers:

beginner question on investigating on samples in Weka

related questions