Goal: Suggest objects based on user's choices
Data: Table containing info on how users would order a subset of objects from the worst to the best; Example:
1 2 3 4 5 6
John: A B G J S O
Mary: A C G L
Joan: B C L J K
Stan: G J C L
There's about 20 times as many users as objects, every user's lineup contains 50-200 objects.
The table:
CREATE TABLE IF NOT EXISTS `pref` (
`usr` int(10) unsigned NOT NULL,
`obj` int(10) unsigned NOT NULL,
`ord` int(10) unsigned NOT NULL,
UNIQUE KEY `u_o` (`usr`,`obj`),
KEY `u` (`usr`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Basic idea: Iterate within user's objects starting from the second worst, building pairs (A > B); look for them in other users' lineups and list items better than A according to those users.
Query:
SELECT e.obj, COUNT(e.obj) AS rate
FROM pref a, pref b, pref c, pref d, pref e
WHERE a.usr = '222' # step 1: select a pair of objects A, B, where A is better than B according to user X
AND a.obj = '111'
AND b.usr = a.usr
AND b.ord < a.ord
AND c.obj = a.obj # step 2: find users thinking that object A is better than B
AND d.obj = b.obj
AND d.ord < c.ord
AND d.usr = c.usr
AND e.ord > c.ord # step 3: find objects better than A according to these users
AND e.usr = c.usr
GROUP BY e.obj
ORDER BY rate DESC;
Aliases:
a
object A ('111'), current user ('222')
b
object B, worse than A according to current user (has lower value of 'ord' than A)
c
object A in other user's lineup
d
object B in other user's lineup
e
object better than A in other user's lineup
The execution plan (ouo and uo being the indexes as suggested by Quassnoi):
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
| 1 | SIMPLE | a | ref | ouo,uo | ouo | 8 | const,const | 1 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | b | ref | ouo,uo | uo | 4 | const | 86 | Using where |
| 1 | SIMPLE | d | ref | ouo,uo | ouo | 4 | db.b.obj | 587 | Using index |
| 1 | SIMPLE | c | ref | ouo,uo | ouo | 8 | const,db.d.usr | 1 | Using where; Using index |
| 1 | SIMPLE | e | ref | uo | uo | 4 | db.d.usr | 80 | Using where |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
The query seems to work fine as long as the dataset is not too big; any ideas on how to streamline it to support larger datasets?