views:

56

answers:

2

I'm using Weka to perform classification on a set of labelled web pages, and measuring classifier performance with AUC. I have a separate six-level factor that is not used in classification, and I'd like to know how well classifiers perform on each level of the factor.

What techniques or measures should I use to test classifier performance on a subset of data?

+1  A: 

I'm not sure if this is exactly what you are asking, but people often use cross-validation to break a single set of data into multiple training/testing subsets to better evaluate learning performance.

The basic idea (for e.g. 10-fold cross-validation) is to:

  1. randomly split your data into training and testing sets
  2. train a classifier on the training set
  3. evaluate its performance on the testing set
  4. repeat steps 1-3 nine more times with different random training/testing splits

The overall performance of the classifier is its average performance on all 10 testing sets.

I looked around a bit and found some examples of how to perform cross-validation programmatically or via the Weka UI.

Nate Kohl
Thanks, I'm already performing cross-validation on the entire data set. After the cross-validation, I want to know how well the classifier performed on each of my factor subsets.
michaeltwofish
Alas. You might get a few more answers if you added a bit more explanation describing your 6-level factor, why evaluating it is a problem, what you've already tried, etc.
Nate Kohl
Users make a relevance judgment about a page, which is my class variable. They also rate the page's "depth" on a six-point scale. I build a classifier based on extracted features (text, link text, number of headings, number of links, for example). I want to know how well the classifier performs at each level of depth.
michaeltwofish
And the problem is that, while I have no problem evaluating the classifier's performance overall, I have no idea how to evaluate performance on a particular subset. So I haven't tried anything yet :)
michaeltwofish
@michaeltwofish: There must be something I'm missing, what's wrong with just calculating AUC separately on the particular subsets of interest?
qdjm
@qdjm It's much more likely I'm missing something :) I didn't even think of calculating AUC separately from the overall results. I'll investigate that, thanks.
michaeltwofish
A: 

The steps that Nate Kohl recommended are all correct. Another, very important question is the function for measuring the performance. In my experience, maximizing the AUC can sometimes lead to substantial bias of the classifier. I prefer using Matthews Correlation Coeficient (MCC) for binary classifiers or Cohen's kappa for categorical classifiers with more than two possible values

bgbg
Yes, the steps are correct for cross-validation, but as I said, that's not actually what I was asking.
michaeltwofish