I'm using Weka to perform classification on a set of labelled web pages, and measuring classifier performance with AUC. I have a separate six-level factor that is not used in classification, and I'd like to know how well classifiers perform on each level of the factor.
What techniques or measures should I use to test classifier performance on a subset of data?