views:

732

answers:

7

What technology goes in behind the screens of Amazon recommendation technology? I believe that Amazon recommendation is currently the best in the market, but how do they provide us with such relevant recommendations?

Recently, we have been involved with similar recommendation kind of project, but would surely like to know about the in and outs of the Amazon recommendation technology from a technical standpoint.

Any inputs would be highly appreciated.

Update:

This patent explains how personalized recommendations are done but it is not very technical, and so it would be really nice if some insights could be provided.

From the comments of Dave, Affinity Analysis forms the basis for such kind of Recommendation Engines. Also here are some good reads on the Topic

  1. Demystifying Market Basket Analysis
  2. Market Basket Analysis
  3. Affinity Analysis

Suggested Reading:

  1. Data Mining: Concepts and Technique
A: 

I don't have any knowledge of Amazon's algorithm specifically, but one component of such an algorithm would probably involve tracking groups of items frequently ordered together, and then using that data to recommend other items in the group when a customer purchases some subset of the group.

Another possibility would be to track the frequency of item B being ordered within N days after ordering item A, which could suggest a correlation.

ElectricDialect
+7  A: 

This isn't directly related to Amazon's recommendation system, but it might be helpful to study the methods used by people who competed in the Netflix Prize, a contest to develop a better recommendation system using Netflix user data. A lot of good information exists in their community about data mining techniques in general.

The team that won used a blend of the recommendations generated by a lot of different models/techniques. I know that some of the main methods used were principal component analysis, nearest neighbor methods, and neural networks. Here are some papers by the winning team:

R. Bell, Y. Koren, C. Volinsky, "The BellKor 2008 Solution to the Netflix Prize", (2008).

A. Töscher, M. Jahrer, “The BigChaos Solution to the Netflix Prize 2008", (2008).

A. Töscher, M. Jahrer, R. Legenstein, "Improved Neighborhood-Based Algorithms for Large-Scale Recommender Systems", SIGKDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition (KDD’08) , ACM Press (2008).

Y. Koren, "The BellKor Solution to the Netflix Grand Prize", (2009).

A. Töscher, M. Jahrer, R. Bell, "The BigChaos Solution to the Netflix Grand Prize", (2009).

M. Piotte, M. Chabbert, "The Pragmatic Theory solution to the Netflix Grand Prize", (2009).

The 2008 papers are from the first year's Progress Prize. I recommend reading the earlier ones first because the later ones build upon the previous work.

Justin Peel
What I like about this answer is that it points to the fact that there is no "perfect" answer and that people keep innovating in this area - there is always some room for improvement and as times change and new methods are applied to the problems it will keep getting solved differently. And if you read the detailed links you can see how there is a "blend" of several approches to prediction within each of the big contenders for the prize. Great references.
Dave Quick
A: 

Someone did a presentation at our University on something similar last week, and referenced the Amazon recommendation system. I believe that it uses a form of K-Means Clustering to cluster people into their different buying habits. Hope this helps :)

Check this out too: http://www.almaden.ibm.com/cs/people/dmodha/ml02.ps and as HTML.

Chris Dennett
+1  A: 

As far I know, it's use Case-Based Reasoning as an engine for it.

You can see in this sources: here, here and here.

There are many sources in google searching for amazon and case-based reasoning.

coelhudo
+6  A: 

Better yet, apply for the SDE opening on my team at Amazon (Personalization Platform). If you get hired then you'll get to learn how it works.

http://www.amazon.com/jobs/

Job Id: 109571

Jeff Bilger
Best answer ever?
awesomo
@Jeff - loved this answer. :-)
Dave Quick
@Jeff Bilger - But having learned how it works, she won't be able to tell the rest of us about it :-)
Stephen C
+11  A: 

It is both an art and a science. Typical fields of study revolve around market basket analysis (also called affinity analysis) which is a subset of the field of data mining. Typical components in such a system include identification of primary driver items and the identification of affinity items (accessory upsell, cross sell).

Keep in mind the data sources they have to mine...
1) purchased shopping carts = real money from real people spent on real items = powerful data and a lot of it.
2) items added to carts but abandoned.
3) pricing experiments online (A/B testing, etc.) where they offer the same products at different prices and see the results
4) packaging experiments (A/B testing, etc.) where they offer different products in different "bundles" or discount various pairings of items
5) wishlists - what's on them specifically for you - and in aggregate it can be treated similarly to another stream of basket analysis data
6) referral sites (identification of where you came in from can hint other items of interest)
7) dwell times (how long before you click back and pick a different item)
8) ratings by you or those in your social network/buying circles - if you rate things you like you get more of what you like and if you confirm with the "i already own it" button they create a very complete profile of you
9) demographic information (your shipping address, etc.) - they know what is popular in your general area for your kids, yourself, your spouse, etc.
10) user segmentation = did you buy 3 books in separate months for a toddler? likely have a kid or more.. etc.
11)direct marketing click through data - did you get an email from them and click through? They know which email it was and what you clicked through on and whether you bought it as a result.
12) click paths in session - what did you view regardless of whether it went in your cart 13) # of times viewed an item before final purchase
14) if you're dealing with a brick and mortar store they might have your physical purchase history to go off of as well (i.e. toys r us or something that is online and also a physical store) 15) etc. etc. etc.

Luckily people behave similarly in aggregate so the more they know about the buying population at large the better they know what will and won't sell and with every transaction and every rating/wishlist add/browse they know how to more personally tailor recommendations. Keep in mind this is likely only a small sample of the full set of influences of what ends up in recommendations, etc.

Now I have no inside knowledge of how Amazon does business (never worked there) and all I'm doing is talking about classical approaches to the problem of online commerce - I used to be the PM who worked on data mining and analytics for the Microsoft product called Commerce Server. We shipped in Commerce Server the tools that allowed people to build sites with similar capabilities.... but the bigger the sales volume the better the data the better the model - and Amazon is BIG. I can only imagine how fun it is to play with models with that much data in a commerce driven site. Now many of those algorithms (like the predictor that started out in commerce server) have moved on to live directly within Microsoft SQL.

The four big take-a-ways you should have are:
1) Amazon (or any retailer) is looking at aggregate data for tons of transactions and tons of people... this allows them to even recommend pretty well for anonymous users on their site.
2) Amazon (or any sophisticated retailer) is keeping track of behavior and purchases of anyone that is logged in and using that to further refine on top of the mass aggregate data.
3) Often there is a means of over riding the accumulated data and taking "editorial" control of suggestions for product managers of specific lines (like some person who owns the 'digital cameras' vertical or the 'romance novels' vertical or similar) where they truly are experts
4) There are often promotional deals (i.e. sony or panasonic or nikon or canon or sprint or verizon pays additional money to the retailer, or gives a better discount at larger quantities or other things in those lines) that will cause certain "suggestions" to rise to the top more often than others - there is always some reasonable business logic and business reason behind this targeted at making more on each transaction or reducing wholesale costs, etc.

In terms of actual implementation? Just about all large online systems boil down to some set of pipelines (or a filter pattern implementation or a workflow, etc. you call it what you will) that allow for a context to be evaluated by a series of modules that apply some form of business logic.

Typically a different pipeline would be associated with each separate task on the page - you might have one that does recommended "packages/upsells" (i.e. buy this with the item you're looking at) and one that does "alternatives" (i.e. buy this instead of the thing you're looking at) and another that pulls items most closely related from your wish list (by product category or similar).

The results of these pipelines are able to be placed on various parts of the page (above the scroll bar, below the scroll, on the left, on the right, different fonts, different size images, etc.) and tested to see which perform best. Since you're using nice easy to plug and play modules that define the business logic for these pipelines you end up with the moral equivalent of lego blocks that make it easy to pick and choose from the business logic you want applied when you build another pipeline which allows faster innovation, more experimentation, and in the end higher profits.

Did that help at all? Hope that give you a little bit of insight how this works in general for just about any ecommerce site - not just Amazon. Amazon (from talking to friends that have worked there) is very data driven and continually measures the effectiveness of it's user experience and the pricing, promotion, packaging, etc. - they are a very sophisticated retailer online and are likely at the leading edge of a lot of the algorithms they use to optimize profit - and those are likely proprietary secrets (you know like the formula to KFC's secret spices) and guaarded as such.

Dave Quick
Yes. It did helped me alot and I really appreciate your inputs on the topic.
Rachel
A: 

I bumped on this paper today:

Maybe it provides additional information.

ewernli