views:

53

answers:

1

Goal: Suggest objects based on user's choices

Data: Table containing info on how users would order a subset of objects from the worst to the best; Example:

          1 2 3 4 5 6
    John: A B G J S O
    Mary: A C G L
    Joan: B C L J K
    Stan: G J C L

There's about 20 times as many users as objects, every user's lineup contains 50-200 objects.

The table:

CREATE TABLE IF NOT EXISTS `pref` (
  `usr` int(10) unsigned NOT NULL,
  `obj` int(10) unsigned NOT NULL,
  `ord` int(10) unsigned NOT NULL,
  UNIQUE KEY `u_o` (`usr`,`obj`),
  KEY `u` (`usr`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Basic idea: Iterate within user's objects starting from the second worst, building pairs (A > B); look for them in other users' lineups and list items better than A according to those users.

Query:

SELECT e.obj, COUNT(e.obj) AS rate
FROM pref a, pref b, pref c, pref d, pref e

WHERE a.usr = '222' # step 1: select a pair of objects A, B, where A is better than B according to user X
AND a.obj = '111'
AND b.usr = a.usr
AND b.ord < a.ord

AND c.obj = a.obj # step 2: find users thinking that object A is better than B
AND d.obj = b.obj
AND d.ord < c.ord
AND d.usr = c.usr 

AND e.ord > c.ord # step 3: find objects better than A according to these users
AND e.usr = c.usr

GROUP BY e.obj
ORDER BY rate DESC;

Aliases:
a object A ('111'), current user ('222')
b object B, worse than A according to current user (has lower value of 'ord' than A)
c object A in other user's lineup
d object B in other user's lineup
e object better than A in other user's lineup

The execution plan (ouo and uo being the indexes as suggested by Quassnoi):

+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref                 | rows | Extra                                        |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
|  1 | SIMPLE      | a     | ref  | ouo,uo        | ouo  | 8       | const,const         |    1 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | b     | ref  | ouo,uo        | uo   | 4       | const               |   86 | Using where                                  | 
|  1 | SIMPLE      | d     | ref  | ouo,uo        | ouo  | 4       | db.b.obj            |  587 | Using index                                  | 
|  1 | SIMPLE      | c     | ref  | ouo,uo        | ouo  | 8       | const,db.d.usr      |    1 | Using where; Using index                     | 
|  1 | SIMPLE      | e     | ref  | uo            | uo   | 4       | db.d.usr            |   80 | Using where                                  | 
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+

The query seems to work fine as long as the dataset is not too big; any ideas on how to streamline it to support larger datasets?

+3  A: 

The query is fine, just create the following indexes:

pref (obj, usr, ord)
pref (usr, ord)

Update:

Try this syntax.

The rating system is simpler but quite similar: it gives almost same rating on the test random results I created.

SELECT  oa.obj, SUM(weight) AS rate
FROM    (
        SELECT  usr, ord,
                (
                SELECT  COUNT(*)
                FROM    pref a
                JOIN    pref ob
                ON      ob.obj = a.obj
                WHERE   ob.usr = o.usr
                        AND a.usr = 50
                        AND a.ord <
                        (
                        SELECT  ord
                        FROM    pref ai
                        WHERE   ai.usr = 50
                                AND ai.obj = 75
                        )
                        AND ob.ord < o.ord
                ) AS weight
        FROM    pref o
        WHERE   o.obj = 75
        HAVING  weight >= 0
        ) ow
JOIN    pref oa
ON      oa.usr = ow.usr
        AND oa.ord > ow.ord
GROUP BY
        oa.obj
ORDER BY
        rate DESC

This query gives the weight to each item rated higher than A by all users who rated A.

The weight is equal to the number of items rated below A by both users.

Quassnoi
Thanks! It's too bad though. For a user with 150 objects in a table with 250k rows it takes around 8-9 seconds (returning 376 rows). :/I tried changing the table type from MyISAM to MEMORY, but for some reason it doesn't want to use multi-column indexes then. Weird.
Mike