I just read how team BellKor’s Pragmatic Chaos is winning the Netflix Challenge on Wired, and I'm curious about how this kind of algorithms usually work. I know team Bellkor's solution must be an innovative one on the field.. but how does the field usually work? Is it just a really detailed database with Markov chains being run over again and again or what?
Take a look at this Wikipedia articel: Euclidean Distance.
The basic idea is that you use a distance metric (like the Euclidean one above) to compare people or things with each other.
The new O'Reilly book, Programming Collective Intelligence: Building Smart Web 2.0 Applications has a great chapter on this very topic.
I found this previous article on Wired, which briefly mentions the k-nearest-neighbor algorithm, used in the past by Bellkor and Cinematch.
The observations made by the psychologist on how to find bias are interesting too.
but how does the field usually work?
It's a Data Mining technique. Data Mining is used as part of Business Intelligence (Data Warehouse and such) trying to find relations and information in huge amounts of data. It's an area of computer sciences, dealing also with machine learning in general, e.g. pattern recognition. Automatic recommendations are got by Association Mining. An association with a high support is shown as recommendation. The k-nearest-neighbor algorithm is only one of many algorithms used by machine learning/data mining people.
If you are interrested in basic theroy, I recommend Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten.
For Java there is a great machine learning package, WEKA that is able to do association mining. Ian Witten is also one of the authors of WEKA.
Most of the Netflix Competition entrants used variations on a Singular Value Decomposition. This algorithm operates by taking a large matrix and simplifying it down to an approximate 2x2 matrix. This 2x2 matrix can then be plotted on a 2-dimensional space where points near each other share affinity with each other in the original matrix.
So, in the case of Netflix, you can create a matrix with the movies being the columns and the users being the rows where any value [i,j] is the rating that the i user gave movie j. This is a very large matrix which can then have an SVD applied to it to generate a two dimensional matrix which serves as an approximation of the larger matrix. Users who are close to each other when plotted on this plane share similar ratings, so if one user has not seen a movie that another user has seen who is close to it on this plane, that could be a recommendation to the new user.
The winning solution designed a variation of a straight SVD algorithm called SVD++ and blended that together with other edge cases to try and produce an algorithm that would exceed the 10% improvement needed to claim the prize.