views:

269

answers:

6

Currently I'm building a system (php and mysql), that on the user profile allows you to add "favorite music artists" to a list. 've been trying to figure out a way to compare the user likes to other users and return a "recommended friends".

For example:

User A Likes
- 1
- 2
- 3
- 4

User B Likes
- 1 <- A likes
- 5
- 6
- 7

User C Likes
- 1 <- A likes
- 2 <- A likes
- 8
- 9

Then with this user A would get the following recommendation:

User C
User B

My guess is that to be able to do this I need to make a Relational Database and standardize most of the user input.

So my questions are: What Database structure is the best for this kind of comparisons? What kind of query should I use? (doesn't need to be exact)

+1  A: 

If you have an Artists table and a Users table, you can have a table FavoriteArtists with two foreign keys: the user, and the favorited artist.

Then just get other users that have similar favorites and recommend friends to the user based on some threshold overlap.

John at CashCommons
+1  A: 

In SQL Server: CREATE TABLE Users ( UserID BIGINT IDENTITY (1,1) NOT NULL --Other columns here )

CREATE TABLE Artists ( ArtistID BIGINT IDENTITY(1,1) NOT NULL -- Other columns )

CREATE TABLE FavoriteArtists ( UserID BIGINT, ArtistID BIGINT )

query to select users with the same likes: SELECT FROM FavoriteArtists u, FavoriteArtists f WHERE u.ArtistID = f.ArtistID AND u.UserID = @TARGET_USER AND f.UserID <> @TARGET_USER

Boogaloo
+2  A: 

A simple implementation could look like this

CREATE TABLE user_tbl(
    user_id BIGINT,
    ...
)

CREATE TABLE music_tbl(
    music_id BIGINT,
    ...
)

CREATE TABLE likes_tbl(
    user_id BIGINT,
    music_id BIGINT
)

To find all the users that have similar taste to a certain user we do this:

select u1.user_id, u2.user_id, count(*) as weight from likes_tbl u1, likes_tbl u2
where u1.music_id = u2.music_id and u1.user_id <> u2.user_id and u1.user_id = @user_id
group by u1.user_id, u2.user_id

The weight column is the number of artists that the users have in column, so the higher the weight, the more they have in common. So you might recommend the top 5 users with the heighest weight.

This can be extended in different ways. One possibility is to add a genre_id to the music_tbl and likes_tbl and then do the join on genre_id.

Mike J
Might want to explain that a little bit...
Agent_9191
Is that better?
Mike J
+1  A: 

Tables

User
userid int
FirstName varchar(30)
LastName varchar(30)

Song
songid int
Title varchar(30)
Artist varchar(30)

UserSong
userid
songid

Query

select User.userid, User.FirstName, User.LastName
from UserSong
inner join Song
on UserSong.songid=Song.songid
inner join User
on UserSong.userid=User.userid
where Song.Artist='Some Artist'

Less Verbose Query Using a Natural Join

select User.userid, User.FirstName, User.LastName
from UserSong
natural join Song
natural join User
where Song.Artist='Some Artist'

(Note that I haven't tested this one, yet. Someone correct me if I'm wrong.)

The above query will give you a list of all users who "like" the given artist. You can then use that list to show other users who else likes what they do.

Michael Todd
Why not just use natural joins?
Morningcoffee
Natural joins? Example?
Michael Todd
Oh: http://en.wikipedia.org/wiki/Join_%28SQL%29#Natural_join Didn't know you could do that. Nice!
Michael Todd
Oh, that's why I don't know about it. I only use MS SqlServer and they don't use that syntax. Works with Oracle, though.
Michael Todd
+4  A: 

Not directly an answer to your question but you might wish to check out the book Programming Collective Intelligence. Based on your question I think you'd find it very helpful.

Hans Lawrenz
Excellent book. The formulas and algorithms are just perfect for what I need.
kuroir
+2  A: 

Not to duplicate what's already been posted, but...

--
-- Working MySQL implementation of a "user compatibility" schema.
--


DROP TABLE IF EXISTS favourite;
DROP TABLE IF EXISTS artist;
DROP TABLE IF EXISTS users;


CREATE TABLE users (
 user_id INT NOT NULL AUTO_INCREMENT,
 name VARCHAR(32),
 PRIMARY KEY (user_id)
);


CREATE TABLE artist (
 artist_id INT NOT NULL AUTO_INCREMENT,
 name VARCHAR(32),
 PRIMARY KEY (artist_id)
);


CREATE TABLE favourite (
 favourite_id INT NOT NULL AUTO_INCREMENT,
 user_id INT NOT NULL,
 artist_id INT NOT NULL,
 UNIQUE (user_id, artist_id),
 PRIMARY KEY (favourite_id),
 FOREIGN KEY (user_id) REFERENCES users (user_id) ON DELETE CASCADE,
 FOREIGN KEY (artist_id) REFERENCES artist (artist_id) ON DELETE CASCADE
);


INSERT INTO users
 (name)
VALUES
 ("Alice"),
 ("Bob"),
 ("Carol"),
 ("Dave")
;


INSERT INTO artist
 (name)
VALUES
 ("Jewel"),
 ("Sarah McLachlan"),
 ("Britney Spears"),
 ("David Bowie"),
 ("The Doors")
;


INSERT INTO favourite
 (user_id, artist_id)
VALUES
 (
  (SELECT user_id FROM users WHERE name = "Alice"),
  (SELECT artist_id FROM artist WHERE name = "Jewel")
 ),
 (
  (SELECT user_id FROM users WHERE name = "Alice"),
  (SELECT artist_id FROM artist WHERE name = "Sarah McLachlan")
 ),
 (
  (SELECT user_id FROM users WHERE name = "Bob"),
  (SELECT artist_id FROM artist WHERE name = "Jewel")
 ),
 (
  (SELECT user_id FROM users WHERE name = "Bob"),
  (SELECT artist_id FROM artist WHERE name = "Sarah McLachlan")
 ),
 (
  (SELECT user_id FROM users WHERE name = "Bob"),
  (SELECT artist_id FROM artist WHERE name = "Britney Spears")
 ),
 (
  (SELECT user_id FROM users WHERE name = "Bob"),
  (SELECT artist_id FROM artist WHERE name = "David Bowie")
 ),
 (
  (SELECT user_id FROM users WHERE name = "Carol"),
  (SELECT artist_id FROM artist WHERE name = "David Bowie")
 ),
 (
  (SELECT user_id FROM users WHERE name = "Carol"),
  (SELECT artist_id FROM artist WHERE name = "The Doors")
 ),
 (
  (SELECT user_id FROM users WHERE name = "Dave"),
  (SELECT artist_id FROM artist WHERE name = "Jewel")
 ),
 (
  (SELECT user_id FROM users WHERE name = "Dave"),
  (SELECT artist_id FROM artist WHERE name = "The Doors")
 )
;


SELECT
 t0.user_id myuser,
 t1.user_id friend,
 COUNT(*)
FROM favourite t0
JOIN favourite t1 ON t1.artist_id = t0.artist_id
WHERE t0.user_id != t1.user_id
GROUP BY t0.user_id, t1.user_id;


--
-- The same thing, but returning names!
--

SELECT
 t0u.name myuser,
 t1u.name friend,
 COUNT(*)
FROM favourite t0
JOIN favourite t1 ON t1.artist_id = t0.artist_id
JOIN users t0u ON t0u.user_id = t0.user_id
JOIN users t1u ON t1u.user_id = t1.user_id
WHERE t0.user_id != t1.user_id
GROUP BY t0.user_id, t1.user_id;

Good luck!

Ryan McConahy