views:

180

answers:

5

I find learning new topics comes best with an easy implementation to code to get the idea. This is how I learned genetic algorithms and genetic programming. What would be some good introductory programs to write to get started with machine learning?

Preferably, let any referenced resources be accessible online so the community can benefit

+4  A: 

I think you can write a "Naive Bayes" classifier for junk email filtering. You can get a lot of information from this book.

http://nlp.stanford.edu/IR-book/information-retrieval-book.html

Upul
Here is another great free book - The Elements of Statistical Learning - www-stat.stanford.edu/~hastie/Papers/ESLII.pdf
tathagata
The EoSL is somewhat difficult, in my opinion. It is not suitable as an initial read, it's level is for graduate students.
lmsasu
Yes I'm also agree with Imsasu. But "An Introduction to Information Retrieval" book is not that hard to read.
Upul
+1  A: 

Decision tree. It is frequently used in classification tasks and has a lot of variants. Tom Mitchell's book is a good reference to implement it.

Yin Zhu
A: 

There is something called books; are you familiar with those? When I was exploring AI two decades ago, there were many books. I guess now that the internet exists, books are archaic, but you can probably find some in an ancient library.

Sam Hobbs
I am sorry that people think that books are no good, but there are abundant resources available in books. Books would be much more helpful than the other answers here.
Sam Hobbs
+5  A: 

What language(s) will you develop in? If you are flexible, I recommend Matlab, python and R as good candidates. These are some of the more common languages used to develop and evaluate algorithms. They facilitate rapid algorithm development and evaluation, data manipulation and visualization. Most of the popular ML algorithms are also available as libraries (with source).

I'd start by focusing on basic classification and/or clustering exercises in R2. It's easier to visualize, and it's usually sufficient for exploring issues in ML, like risk, class imbalance, noisy labels, online vs. offline training, etc. Create a data set from everyday life, or a problem you are interested in. Or use a classic, like the Iris data set, so you can compare your progress to published literature. You can find the Iris data set at:

One of its nice features is that it has one class, 'setosa', that is easily linearly separable from the others.

Once you pick a couple of interesting data sets, begin by implementing some standard classifiers and examining their performance. This is a good short list of classifiers to learn:

  • k-nearest neighbors
  • linear discriminant analysis
  • decision trees (e.g., C4.5)
  • support vector machines (e.g., via LibSVM)
  • boosting (with stumps)
  • naive bayes classifier

With the Iris data set and one of the languages I mention, you can easily do a mini-study using any of the classifiers quickly (minutes to hours, depending on your speed).

Edit: You can google "Iris data classification" to find lots of examples. Here is a classification demo document by Mathworks using Iris data set:

http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/classdemo.html

Andrew B.
A: 

Neural nets may be the easiest thing to implement first, and they're fairly thoroughly covered throughout literature.

broom