Having just read the recent article in Wired, I'm curious: what is it about the Netflix Prize that's so challenging? I mean this in the sincerest way possible, I'm just curious about the difficulties posed by the contest. Are most recommendation engines in general this hard to improve? If so, why is that? Or, is Netflix unusually difficult to improve, and if this is the case, what's special about Netflix that makes this so much more challenging than, say, Amazon?
Because NetFlix already has a really good recommendation engine. If they knew how to easily improve it, they would have done so by now. Their whole business model is around cross selling products (movies) to consumers. The recomendation algorithm is really the core of their business. The better it works, the more money they stand to make.
I think there have been some articles written on this, but I don't know where they are at the moment so I'll just explain it here.
When people shop Amazon for books (for example), they tend to buy books of a specific type, so it can be easy to suggest other books of the same type.
With movies, people may do the same thing, however people usually don't limit themselves to one genre. People may watch a much wider variety of movies: horror, comedy, action, romance, etc.
Predicting what you like from those genres can be tough to predict if you've only rented one movie so far, and that movie is a drama.
If someone were to come up with a very clever recommendation engine, Netflix could benefit from it phenomenally. I think they're mainly looking for an engine that can recommend things based on only one or 2 movies. New customers who don't know much about Netflix have a better chance of sticking around if they find movies they like early on without having to search for them.
In my opinion, they already have a recommendation engine on par with Amazon. I think they're looking to enhance it further.
Me and my colleague took part in it. I do not have a strong AI background, but recommendation engines require some deep knowledge of existing literature algorithms like Gibbs sampling, K method, nearest neighbour etc. We used Gibbs sampling and I can say we sucked :) compared to what Netflix already has.
Recommender systems suffer from problems that are hard to fix:
- Cold start - In a new system or with a new user, there isn't enough data to create an accurate statistical model for a recommendation.
- Rating bias - If you base recommendations on user ratings, users that rate often sway the results toward their taste. If you're the type of person that doesn't like the extra step of rating, it's possible people with similar taste don't like rating either so their opinions are excluded from recommendations.
- Items that are not rated are less likely to be rated - if you select, and therefore rate, items based on their ratings, items that aren't rated are less visible and will have a hard time getting the ratings they need to effect recommendations. In the other direction, popular items have more visibility, are rated more often, and therefore play a larger part in recommendations.
- Temporal bias - Users' ratings change with time. With long-term changes, you can compensate by adding a time element to your recommendations. Short-term changes are harder to fix. After a Chuck Norris marathon, you may be more likely to give action movies high marks. The next day, after crying your eyes out to Steel Magnolias, you may be temporarily biased against action movies.
- Varying motives - in item-based recommender systems, the knitting book you purchased for your aunt's birthday will skew your recommendations (if you don't take the time to tell the system not to use it). You may give a bad kids' movie a high rating because your kids loved it.
All together, this makes recommender systems hard to improve past just-okay. A system with 80% accuracy seems great but is wrong 1 out of 5 times. This makes them more trouble than they're worth for some users.