views:

133

answers:

1

Hi

I'm trying to develop a site that recommends items(fx. books) to users based on their preferences. So far, I've read O'Reilly's "Collective Intelligence" and numerous other online articles. They all, however, seem to deal with single instances of recommendation, for example if you like book A then you might like book B.

What I'm trying to do is to create a set of 'preference-nodes' for each user on my site. Let's say a user likes book A,B and C. Then, when they add book D, I don't want the system to recommend other books based solely other users experience with book D. I wan't the system to look up similar 'preference-nodes' and recommend books based on that.

Here's an example of 4 nodes:

User1: 'book A'->'book B'->'book C'
User2: 'book A'->'book B'->'book C'->'book D'
user3: 'book X'->'book Y'->'book C'->'book Z'
user4: 'book W'->'book Q'->'book C'->'book Z'

So a recommendation system, as described in the material I've read, would recommend book Z to User 1, because there are two people who recommends Z in conjuction with liking C (ie. Z weighs more than D), even though a user with a similar 'preference-node', User2, would be more qualified to recommend book D because he has a more similar interest-pattern.

So does any of you have any experience with this sort of thing? Is there some things I should try to read or does there exist any open source systems for this?

Thanks for your time!

Small edit: I think last.fm's algorithm is doing exactly what I my system to do. Using the preference-trees of people to recommmend music more personally to people. Instead of just saying "you might like B because you liked A"

+1  A: 

Create a table and insert the test data:

CREATE TABLE `ub` (
  `user_id` int(11) NOT NULL,
  `book_id` varchar(10) NOT NULL,
  PRIMARY KEY (`user_id`,`book_id`),
  UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into ub values (1, 'A'), (1, 'B'), (1, 'C');
insert into ub values (2, 'A'), (2, 'B'), (2, 'C'), (2,'D');
insert into ub values (3, 'X'), (3, 'Y'), (3, 'C'), (3,'Z');
insert into ub values (4, 'W'), (4, 'Q'), (4, 'C'), (4,'Z');

Join the test data onto itself by book_id, and create a temporary table to hold each user_id and the number of books it has in common with the target user_id:

create temporary table ub_rank as 
select similar.user_id,count(*) rank
from ub target 
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id
where target.user_id = 1
group by similar.user_id;

select * from ub_rank;
+---------+------+
| user_id | rank |
+---------+------+
|       2 |    3 |
|       3 |    1 |
|       4 |    1 |
+---------+------+
3 rows in set (0.00 sec)

We can see that user_id has 3 in common with user_id 1, but user_id 3 and user_id 4 only have 1 each.

Next, select all the books that the users in the temporary table have that do not match the target user_id's books, and arrange these by rank. Note that the same book might appear in different user's lists, so we sum the rankings for each book so that common books get a higher ranking.

select similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id 
left join ub target on target.user_id = 1 and target.book_id = similar.book_id
where target.book_id is null
group by similar.book_id
order by total_rank desc;

+---------+------------+
| book_id | total_rank |
+---------+------------+
| D       |          3 |
| Z       |          2 |
| X       |          1 |
| Y       |          1 |
| Q       |          1 |
| W       |          1 |
+---------+------------+
6 rows in set (0.00 sec)

Book Z appeared in two user lists, and so was ranked above X,Y,Q,W which only appeared in one user's list. Book D did best because it appeared in user_id 2's list, which had 3 items in common with target user_id 1.

Martin
Wow, this is a really comprehensive response. Thank you very much!
soren.qvist