I'm reading the book Programming Collective Intelligence and trying to apply what I've learned against the Northwind database. Though I'm not yet confident in my understanding of the algorithms presented, I'm starting to get an idea of the general concepts.
Using the Northwind database, I'm attempting to show a list of "customers who bought this also bought XYZ" using the following pseudo-logic:
- Find other customers who also purchased my item
- Find all other items purchased by those customers
- Rank items based on purchase count
- Return top N items from previous step
I'm working with the query below:
declare
@customerid nchar(5),
@productid int;
set @customerid = 'ALFKI';
set @productid = 59;
-- find other products from customers who
-- also purchased my productid
select top 10
od.productid, c.categoryname, p.productname, p.unitsonorder, count(od.productid)
from
[order details] od
inner join orders o on o.orderid = od.orderid
inner join products p on p.productid = od.productid
inner join categories c on c.categoryid = p.categoryid
where
o.customerid <> @customerid and
od.productid <> @productid and
p.discontinued = 0
group by
od.productid, c.categoryname, p.productname, p.unitsonorder
order by 5 desc,4 desc
I think my next step would be to break the query up so that I can filter based on recent purchases (as opposed to all historical purchases) and limit customer matching to N customers, rather than ALL customers who purchased my product. Can anyone provide any pointers? Am I headed in the right direction? Should I take a different direction entirely?
At this point, my goal is performance over accuracy, as I know I do not yet have the experience to apply the algorithms to their fullest benefit. I'm merely trying to apply the concept. Once I'm satisfied I understand it well enough, I intend to test this query against a larger database with more realistic customer data.