views:

514

answers:

6

I am looking for a simple suggestion algorithm to implement in to my Web App. Much like Netflix, Amazon, etc... But simpler. I don't need teams of Phd's working to get a better suggestion metric.

So say I have:

  • User1 likes Object1.
  • User2 likes Object1 and Object2.

I want to suggest to User1 they might also like Object2.

I can obviously come up with something naive. I'm looking for something vetted and easily implemented.

+4  A: 

There are many simple and not so simple examples of suggestion algorithms in the excellent Programming Collective Intelligence

The Pearson correlation coefficient (a little dry Wikipedia article) can give pretty good results. Here's an implementation in Python and another in TSQL along with an interesting explanation of the algorithm.

Yann Schwartz
+1  A: 

You may wanna look at Association rule learning and Apriori algorithm. The basic idea behind is is that you create rules like "if User like Object1, than User likes Object2" and check how well they describe (your) reality. In your concrete example, this rule would have a Support of 2 (as two Users like Object1) and a confidence of 50% a (as the rule is true in 1 of 2 cases). I've just implemented a basic proofe of concept myself (actually my first steps on Hadoop) and it's not too difficult to do.

Alternatively, you may wanna look at Apache Mahout - Taste. I did't ever use it myself though.

sfussenegger
+2  A: 

try a Slope One algorithm, it's one of the most used for this kind of problem.

here's a sample implementation in t-sql

Zenon
+1 for the link to source code. Here's Slope One in 40 lines of Python (and a detailed explanation): http://www.serpentine.com/blog/2006/12/12/collaborative-filtering-made-easy/
Jason Orendorff
+1  A: 

I would go with K nearest neighbors. The wikipedia entry explains it well, and has links to reference implementations.

Ofri Raviv
A: 

k-nearest neighbor algorithm

Josh Ribakoff
A: 

I created a suggested articles algorithm that used keywords (as opposed to "product purchases") to determine correlation. It takes a keyword, and runs through all other articles where that keyword occurs and produces results based on which articles have the most matching keywords.

Besides the obvious need for caching such information, is there something wrong with him using a similar method?

dclowd9901