views:

73

answers:

3

Hi all,

I am facing a problem on selecting correct classifier for my data-mining task.

I am labeling webpages using statistical method and label them using a 1-4 scale,1 being the poorest while 4 being the best.

Previously,I used SVM to train the system since I was using a binary(1,0) label then.But now since I switch to this 4-class label,I need to change classifier,because I think the SVM classifier will only work for two-class classification(Please correct me if I am wrong).

So could you please offer some suggestion here on what kind of classifier is most approriate here for my classification purpose.

Thanks in advance for suggestions.

+4  A: 

There exist multi-class SVMs. LibSVM has an implementation, as does Weka.

Usually it's better to experiment with several classifiers to find out which one works best on your data. The choice of classifier type and training algorithm is far less important than your choice of feature set. You could try naïve Bayes, multi-class SVM, MaxEnt, voted perceptrons, or whatever your library offers.

larsmans
Thanks!Do you know how to enable multi-class SVM in Weka please?I played around with it for a while but it only worked for binary classes.
Robert
I don't use Weka myself, but apparently you need either the `weka.classifiers.functions.SMO` class, or the separate plugin WLSVM (http://www.cs.iastate.edu/~yasser/wlsvm/)
larsmans
I would love to know what the evidence behind that statement about the relative importance of feature set vs algorithm type is. I've just run into an example where going from naive baies to SVM made a big difference. The feature set was exactly the same. And if you listen to Google's Norvig, neither matters, only training set size does.
piccolbo
Naïve Bayes and ID3 are perhaps the exception; of course there are differences, but among the *newer* algorithms, they're not all that big in my experience. And yes, as Norvig (and Microsoft's Eric Brill showed), training set size is much more important, but I guessed the OP had a fixed set.
larsmans
+2  A: 

You are talking about "ordinal classification". It can be done modified using SVM (as already mentioned, it is also implemented in libSVM), using logistic regression, and even using decision trees, or artificial neural networks.

You can even continuize your labels, perform regression analysis of your choice, and then descretize the output. Most of the methods I have mentioned above do that behind the scenes.

Good luck

bgbg
+1  A: 

You might try to check Andrew NG Lecture on how to choose the ML algorithm that bests suits you, I think is quite enlightening, and it might give you some insight on how to manage your data

Leon palafox