views:

613

answers:

2

I've been trying to understand the AdaBoost algorithm without much success. I'm struggling with understanding the Viola Jones paper on Face Detection as an example.

Can you explain AdaBoost in laymen's terms and present good examples of when it's used?

+1  A: 

AdaBoost uses a number of training sample images (such as faces) to pick a number of good 'features'/'classifiers'. For face recognition a classifiers is typically just a rectangle of pixels that has a certain average color value and a relative size. AdaBoost will look at a number of classifiers and find out which one is the best predictor of a face based on the sample images. After it has chosen the best classifier it will continue to find another and another until some threshold is reached and those classifiers combined together will provide the end result.

This part you may not want to share with non-technical people :) but it is interesting anyway. There are several mathematical tricks which make AdaBoost fast for face recognition such as the ability to add up all the color values of an image and store them in a 2 dimensional array so that the value in any position will be the sum of all the pixels up and to the left of that position. This array can be used to very quickly calculate the average color value of any rectangle within the image by subtracting the value found in the top left corner from the value found in the bottom right corner and dividing by the number of pixels in the rectangle. Using this trick you can quickly scan over an entire image looking for rectangles of different relative sizes that match or are close to a particular color.

Hope this helps.

dennisjtaylor
+2  A: 

Adaboost is an algorithm that combines classifiers with poor performance, aka weak learners, into a bigger classifier with much higher performance.

How does it work? In a very simplified manner:

  1. Train a weak learner.
  2. Add it to the set of weak learners trained so far (with an optimal weight)
  3. Increase the importance of samples that are still miss-classified.
  4. Go to 1.

There is a broad and detailed theory behind the scenes, but the intuition is just that: let each "dumb" classifier focus on the mistakes the previous ones were not able to fix.

AdaBoost is one of the most used algorithms in the machine learning community. In particular, it is useful when you know how to create simple classifiers (possibly many different ones, using different features), and you want to combine them in an optimal way.

In Viola and Jones, each different type of weak-learner is associated to one of the 4 or 5 different Haar features you can have.

Hugo