views:

34

answers:

3

I want to find items in common from the "following_list" column in a table of users:

+----+--------------------+-------------------------------------+
| id | name               | following_list                      |
+----+--------------------+-------------------------------------+
|  9 | User 1             | 26,6,12,10,21,24,19,16              | 
| 10 | User 2             | 21,24                               | 
| 12 | User 3             | 9,20,21,26,30                       | 
| 16 | User 4             | 6,52,9,10                           | 
| 19 | User 5             | 9,10,6,24                           | 
| 21 | User 6             | 9,10,6,12                           | 
| 24 | User 7             | 9,10,6                              | 
| 46 | User 8             | 45                                  | 
| 52 | User 9             | 10,12,16,21,19,20,18,17,23,25,24,22 | 
+----+--------------------+-------------------------------------+

I was hoping to be able to sort by the number of matches for a given user id. For example, I want to match all users except #9 against #9 to see which of the IDs in the "following_list" column they have in common.

I found a way of doing this through the "SET" datatype and some bit trickery:
http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html#bits

However, I need to do this on an arbitrary list of IDs. I was hoping this could be done entirely through the database, but this is a little out of my league. Any bit gurus out there?

Thanks, Landon


EDIT: Thanks for the help everybody. I'm still curious as to whether a bit-based approach could work, but the 3-table join works nicely.

SELECT a.following_id, COUNT( c.following_id ) AS matches
FROM following a
LEFT JOIN following b ON b.user_id = a.following_id
LEFT JOIN following c ON c.user_id = a.user_id
  AND c.following_id = b.following_id
WHERE a.user_id = ?
GROUP BY a.following_id

Now I have to keep convincing myself not to prematurely optimize. Thanks, gents.

+1  A: 

If you normalised your following_list column into a separate table with user_id and follower_id, then you'd find that COUNT() was extremely easy to use. You'd also find the logic for selecting a list of followers, or a list of user's being followed much easier

Mark Baker
+1 - I always cringe when I see a schema with columns like that...
Eric Petroelje
The schema is as you have suggested, and this was simply an experiment with what I found in the MySQL article concerning the bit strategy for items in common.I understand the logic, I just want to be able to join this as a "social relevance" factor on other tables with as little handled within the application itself as possible.So, in one database query, how could I get the list of "followee's in common" I'm looking for? (Use your assumed schema, as mine's far too complicated to try to explain here)
A: 

Normalize the table, drop the column following_list, create a table following:

user_id
following_id

Which leads to the easy-peasy query (untested, you get the point):

SELECT b.user_id, COUNT(c.following)
FROM following a
JOIN following b -- get followings of <id> 
ON b.following_id = a.following_id
AND b.user_id = a.following_id
JOIN following c -- get all (other) followings of <id> again, match with followings of b
ON b.following_id = c.following_id
AND c.user_id = a.user_id
WHERE a.user_id = <id>
GROUP BY b.user_id
ORDER BY COUNT(b.following) DESC

Performance may very well very based on indexes & size of dataset, maybe add a 'similarity' column which is updated at regular intervals or changes just for fast data retrieval.

Wrikken
This was my first approach. The problem is two-fold:1) I'm only interested in people <id> is following (MAX number of potential matches). 2) For each person in that group, I want to find how many of the others he/she is following too. Can I do this with the join approach you've described?
(1) is easily done, (2) puzzles me: are 'having the following list in common' and 'how many of of the others (s)he is following' not equivalent?
Wrikken
(1) implemented in answer btw.
Wrikken
Got it. Did left joins so that it'd still include the ones with none in common. Thanks for your help!
+1  A: 

Your problem would be simplified if you could split your following_list column off into a child table, e.g.

TABLE id_following_list:

id | following
--------------
10 | 21
10 | 24
46 | 45
...| ...

You can read more here.

VeeArr