views:

222

answers:

3

Hi, everybody. I am entirely new to the topic of classification algorithms, and need a few good pointers about where to start some "serious reading". I am right now in the process of finding out, whether machine learning and automated classification algorithms could be a worthwhile thing to add to some application of mine.

I already scanned through "How to Solve It: Modern heuristics" by Z. Michalewicz and D. Fogel (in particular, the chapters about linear classifiers using neuronal networks), and on the practical side, I am currently looking through the WEKA toolkit source code. My next (planned) step would be to dive into the realm of Bayesian classification algorithms.

Unfortunately, I am lacking a serious theoretical foundation in this area (let alone, having used it in any way as of yet), so any hints at where to look next would be appreciated; in particular, a good introduction of available classification algorithms would be helpful. Being more a craftsman and less a theoretician, the more practical, the better...

Hints, anyone?

+4  A: 

I've always found Andrew Moore's Tutorials to be very useful. They're grounded in solid statistical theory and will be very useful in understanding papers if you choose to read them in the future. Here's a short description:

These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning

Jacob
Thanks. This was about what I was looking for.
Dirk
+1  A: 

Overview of Machine Learning

To get a good overview of the field, watch the video lectures of Andrew Ng's Machine Learning course.

This course (CS229) -- taught by Professor Andrew Ng -- provides a broad introduction to machine learning and statistical pattern recognition. Topics include supervised learning, unsupervised learning, learning theory, reinforcement learning and adaptive control. Recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing are also discussed.

Classifiers

As for which classifier you should use, I'd recommend first starting with Support Vector Machines (SVM) for general applied classification tasks. They'll give you state-of-the-art performance, and you don't really need to understand all of the theory behind them to just use the implementation provided by a package like WEKA.

If you have a larger data-set, you might want to try using Random Forests. There's also an implementation of this algorithm in WEKA, and they train much faster on large data. While they're less broadly used than SVMs, their accuracy tends to match or nearly match the accuracy you could get from one.

dmcer
Thanks. A very readable introduction for beginners like me to SVN seems to be http://www.tristanfletcher.co.uk/SVM%20Explained.pdf
Dirk
+2  A: 

The answer referring to Andrew Moore's tutorials is a good one. I'd like to augment it, however, by suggesting some reading on the need which drives the creation of many classification systems in the first place: identification of causal relationships. This is relevant to many modeling problems involving statistical inference.

The best current resource I know of for learning about causality and classifier systems (especially Bayesian classifiers) is Judea Pearl's book "Causality: models, reasoning, and inference".

Joel Hoff