views:

199

answers:

6

I've had a bit of a search, but didn't find anything quite like what I'm trying to achieve.

Basically, I'm trying to find a similarity between two users' voting habits.

I have a table storing each individual vote made, which stores:

voteID
itemID     (the item the vote is attached to)
userID     (the user who voted)
direction  (whether the user voted the post up, or down)

I'm aiming to calculate the similarity between, say, users A and B, by finding out two things:

  1. The number of votes they have in common. That is, the number of times they've both voted on the same post (the direction does not matter at this point).
  2. The number of times they've voted in the same direction, on common votes.

(Then simply to calculate #2 as a percentage of #1, to achieve a crude similarity rating).

My question is, how do I find the intersection between the two users' sets of votes? (i.e. how do I calculate point #1 adequately, without looping over every vote in a highly inefficient way.) If they were in different tables, an INNER JOIN would suffice, I'd imagine... but that obviously won't work on the same table (or will it?).

Any ideas would be greatly appreciated.

+1  A: 

You most certainly can join a table to itself. In fact, that's what you're going to have to do. You must use aliasing when joining a table to itself. If your table doesn't have a PK or FK, you'll have to use Union instead. Union will remove duplicates and Union All will not.

GregD
But how do you go about joining it on the same column with the differing user IDs? Any chance you could give an example - as I'm obviously overlooking something, here...
James Burgess
I was going to, but ninesided beat me too it.
GregD
my bad, your post appeared halfway through me typing mine!
ninesided
no problems. Your post was good! :)
GregD
+4  A: 

Something like this:

SELECT COUNT(*)
FROM votes v1
INNER JOIN votes v2 ON (v1.item_id = v2.item_id)
WHERE v1.userID = 'userA'
AND v2.userUD = 'userB'
ninesided
Just what I needed, thanks.
James Burgess
+2  A: 

Here's an example that should get you closer:

SELECT COUNT(*)
FROM (
      SELECT u1.userID
      FROM vote u1, vote u2
      WHERE u1.itemID = u2.itemID
      AND u1.userID = user1
      AND u2.userID = user2)
Lance Roberts
old school join syntax
ninesided
That could be, you're welcome to point me out to some article on joins that explains why one way is better than the other. I don't have tons of SQL experience.
Lance Roberts
It's all about readability and ease of comprehension. Check out this answer: http://bit.ly/ansijoin
ninesided
Thanks, that was a good link. I'll upvote your answer.
Lance Roberts
+2  A: 

Assuming userID 1 being compared to userID 2

For finding how many votes they have in common:

SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
                            AND v2.itemID = v1.itemID)
WHERE v1.userID = 1;

For finding when they also voted the same:

SELECT COUNT(*)
FROM Votes AS v1
INNER JOIN Votes AS v2 ON (v2.userID = 2
                            AND v2.itemID = v1.itemID
                            AND v2.direction = v1.direction)
WHERE v1.userID = 1;
Chad Birch
+2  A: 

A self join is in order. Here it is with all you asked:

SELECT v1.userID user1, v2.userID user2,
  count(*) n_votes_in_common,
  sum(case when v1.direction = v2.direction then 1 else 0 end) n_votes_same_direction,
  (n_votes_same_direction * 100.0 / n_votes_in_common) crude_similarity_percent 
FROM votes v1
INNER JOIN votes v2
ON v1.item_id = v2.item_id
Carlos A. Ibarra
+2  A: 

In case you want to do this for a single user (rather than knowing both users at the start) to find to whom they are the closest match:

SELECT
     v2.userID,
     COUNT(*) AS matching_items,
     SUM(CASE WHEN v2.direction = v1.direction THEN 1 ELSE 0 END) AS matching_votes
FROM
     Votes v1
INNER JOIN Votes v2 ON
     v2.userID <> v1.userID AND
     v2.itemID = v1.itemID
WHERE
     v1.userID = @userID
GROUP BY
     v2.userID

You can then limit that however you see fit (return the top 10, top 20, all, etc.)

I haven't tested this yet, so let me know if it doesn't act as expected.

Tom H.
That's actually really helpful, thanks :)
James Burgess