views:

411

answers:

1

So a recent question made me aware of the rather cool apriori algorithm. I can see why it works, but what I'm not sure about is practical uses. Presumably the main reason to compute related sets of items is to be able to provide recommendations for someone based on their own purchases (or owned items, etcetera). But how do you go from a set of related sets of items to individual recommendations?

The Wikipedia article finishes:

The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence. Suppose one of the large itemsets is Lk, Lk = {I1, I2, … , Ik}, association rules with this itemsets are generated in the following way: the first rule is {I1, I2, … , Ik-1}⇒ {Ik}, by checking the confidence this rule can be determined as interesting or not. Then other rule are generated by deleting the last items in the antecedent and inserting it to the consequent, further the confidences of the new rules are checked to determine the interestingness of them. Those processes iterated until the antecedent becomes empty

I'm not sure how the set of association rules helps in determining the best set of recommendations either, though. Perhaps I'm missing the point, and apriori is not intended for this use? In which case, what is it intended for?

+1  A: 

So the apriori algorithm is no longer the state of the art for Market Basket Analysis (aka Association Rule Mining). The techniques have improved, though the Apriori principle (that the support of a subset upper bounds the support of the set) is still a driving force.

In any case, the way association rules are used to generate recommendations is that, given some history itemset, we can check each rule's antecedant to see if is contained in the history. If so, then we can recommend the rule's consequent (eliminating cases where the consequent is already contained in the history, of course).

We can use various metrics to rank our recommendations, since with a multitude of rules we may have many hits when comparing them to a history, and we can only make a limited number of recommendations. Some useful metrics are the support of a rule (which is the same as the support of the union of the antecedant and the consequant), the confidence of a rule (the support of the rule over the support of the antecedant), and the lift of a rule (the support of the rule over the product of the support of the antecedant and the consequent), among others.

rampion
Comparing every rule against a given user's set seems extremely inefficient - isn't this a performance issue? Is it possible to optimize this?
Nick Johnson
Well, you can sort your rules by the metrics once, and then, for each history, stop checking after sufficient rules match the history. Alternately, you could create lattices of rules by antecedant, and only check the child rules if their parent rules match. You could cluster the rules by antecedant too.
rampion
Good ideas. Ranking seems like it would still only reduce the amount of work by a constant factor - most rules won't match. Indexing seems a better idea, though I'm not aware of any indexes that work well for finding entries that are a subset of the search query.
Nick Johnson