collaborative-filtering

Open Source collaborative filtering frameworks

I was wondering if there exists any open source frameworks that will help me include the following type of functionality to my website: 1) If I am viewing a particular product, I would like to see what other products may be interesting to me. This information may be deduced by calculating for example what other people in my region (or ...

Capturing Implicit Signals of Interest in Django

To set the background: I'm interested in: Capturing implicit signals of interest in books as users browse around a site. The site is written in django (python) using mysql, memcached, ngnix, and apache Let's say, for instance, my site sells books. As a user browses around my site I'd like to keep track of which books they've viewed, ...

Collaborative Filtering: Ways to determine implicit scores for products for each user?

Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm. My objective is to calculate a score for each product that a user has some sort of history with. The data I am currently collecting: User order history Product pageview history for b...

What is algorithm behind the recommendation sites like last.fm, grooveshark, pandora?

I am thinking of starting a project which is based on recommandation system. I need to improve myself at this area which looks like a hot topic on the web side. Also wondering what is the algorithm lastfm, grooveshark, pandora using for their recommendation system. If you know any book, site or any resource for this kind of algorithms pl...

Building a Collaborative filtering / Recommendation System

I'm in the process of designing a website that is built around the concept of recommending various items to users based on their tastes. (i.e. items they've rated, items added to their favorites list, etc.) Some examples of this are Amazon, Movielens, and Netflix. Now, my problem is, I'm not sure where to start in regards to the mathema...

is KNN valuable if most ratings are a 5 / passive filtering recommendations

I've been looking at building a 'people who like x, also like y' type recommendation system, and was looking at using Vogoo, but after looking through their code it seems there is a lot of nearest neighbor based on ratings. Over the last few weeks I've seen a few articles stating that most people either don't rate at all, or rate a 5 h...

Best similarity metric for collaborative filtering?

I'm trying to decide on the best similarity metric for a product recommendation system using item-based collaborative filtering. This is a shopping basket scenario where ratings are binary valued - the user has either purchased an item or not - there is no explicit rating system (eg, 5-stars). Step 1 is to compute item-to-item similarit...

Collaborative Filtering Program: What to do for a Pearson Score When There Isn't Enough Data

I'm building a recommendation engine using collaborative filtering. For similarity scores, I use a Pearson correlation. This is great most of the time, but sometimes I have users that only share a 1 or 2 fields. For example: User 1{ a: 4 b: 2 } User 2{ a: 4 b: 3 } Since this is only 2 data points, a Pearson correlation would always...

SQL to calculate the Tanimoto Coefficient of several vectors

Hi, I think it's easier to explain my problem with an example. I have one table with ingredients for recipes and I have implemented a function to calculate the Tanimoto coefficient between ingredients. It's fast enough to calculate the coefficient between two ingredients (3 sql queries needed), but it does not scale well. To calculate ...

What is the difference between Collaborative Filtering and Collaborative Quality Filtering?

Hello, I am currently looking into Collaborative Quality Filtering and was just wondering, What is the difference between Collaborative Filtering and Collaborative Quality Filtering? It seems to me that they are both exactly the same thing (different names for the same thing). Do they have seperate definitions or something? I have trie...

Collaborative Filtering: Non-Personalized item-to-item similarity

I'm trying to compute item-to-item similarity along the lines of Amazon's "Customers who viewed/purchased X have also viewed/purchased Y and Z". All of the examples and references I've seen are for either computing item similarity for ranked items, for finding user-user similarity, or for finding recommended items based on the current u...

Collaborative filtering in MySQL ?

Hi I'm trying to develop a site that recommends items(fx. books) to users based on their preferences. So far, I've read O'Reilly's "Collective Intelligence" and numerous other online articles. They all, however, seem to deal with single instances of recommendation, for example if you like book A then you might like book B. What I'm tr...

collaborative filtering in rails

I'm looking for a solution for collaborative filtering in rails or even possible examples. So far I have only found acts_as_recommendable which looks useful but I noticed it hasn't had any updates in the last 2 years. Does anyone know of any other solutions and/or examples? ...

How to optimize an database suggestion engine

Hi, I`m making an online engine for item-to-item recommending movies. I have made some researches and I think that the best way to implement that is using pearson correlation and make a table with item1, item2 and correlation fields, but the problem is that after each rate of item I have to regenerate the correlation for in the worst cas...

Converting python collaborative filtering code to use Map Reduce

Using Python, I'm computing cosine similarity across items. given event data that represents a purchase (user,item), I have a list of all items 'bought' by my users. Given this input data (user,item) X,1 X,2 Y,1 Y,2 Z,2 Z,3 I build a python dictionary {1: ['X','Y'], 2 : ['X','Y','Z'], 3 : ['Z']} From that dictionary, I generate a...

How to prune data set by frequency to conform to paper's description

The MovieLens data set provides a table with columns: userid | movieid | tag | timestamp I have trouble reproducing the way they pruned the MovieLens data set used in: Tag Informed Collaborative Filtering, by Zhen, Li and Young In 4.1 Data Set of the above paper, it writes "For the tagging information, we only keep those tags which ...

Efficient item similarity search using sphinx

Is it possible to perform document similarity search efficiently using sphinx search? My index consists of 500k documents, each which is tagged by 5-30 different short, all lowercase stemmed words which is the data to search through. For simplicity, all tags in the database has equal weights and I'm not using phrase searching. My first a...

recommendation system data collection methodology

i am building a recommendation system in my application and i am probably going to use apache mahout, i ve to collect a big dataset, it ll be collected over a period of time...so which one is least expensive between collecting it in some sort of log file vs collecting in a DB and exporting it when i need it ...

What are some ways for a reccomendation engine to deal with one time, novel and potentially important content?

Say you built a recommendation engine that would recommend you live TV shows for you to watch. For regular shows, you could do a pretty good job using collaborative filtering and the like. But say it was something like the 1969 moon landing. It's obviously an important event, you want your recommendation engine to handle that case. But y...