views:

114

answers:

1

Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm.

My objective is to calculate a score for each product that a user has some sort of history with.

The data I am currently collecting:

  • User order history
  • Product pageview history for both anonymous and registered users

All of this data is timestamped.

What I'm looking for

There are a couple of things I'm looking for suggestions on, and ideally this question should be treated more for discussion rather than aiming for a single 'right' answer.

  • Any additional data I can collect for a user that can directly imply an interest in a product
  • Algorithms/equations for turning this data into scores for each product

What I'm NOT looking for

Just to avoid this question being derailed with the wrong kind of answers, here is what I'm doing once I have this data for each user:

  • Generating a number of user clusters (21 at the moment) using the k-means clustering algorithm, using the pearsons coefficient for the distance score
  • For each user (on demand) calculating their a graph of similar users by looking for their most and least similar users within their cluster, and repeating for an arbitrary depth.
  • Calculating a score for each product based on the preferences of other users within the user's graph
  • Sorting the scores to return a list of recommendations

Basically, I'm not looking for ideas on what to do once I have the input data (I may need further help with that later, but it's not the point of this question), just for ideas on how to generate this input data in the first place

+1  A: 
  1. You can allow users to explicitly state their preferences, the way netflix allows users to assign stars.
  2. You can assign a positive numeric value for all the stuff they bought, since you say you do have their purchase history. Assign zero for stuff they didn't buy
  3. You could do some sort of weighted value for stuff they bought, adjusted for what's popular. (if nearly everybody bought a product, it doesn't tell you much about a person that they also bought it) See "term frequency–inverse document frequency"
  4. You could also assign some lesser numeric value for items that users looked at but did not buy.
nont