views:

205

answers:

0

I was reading the great tagging article by Nitin Borwankar and he started me thinking of the ways to implement differnet levels of searches using two tables.

tags {
  id,
  tag
}

post_tags {
  id
  user_id
  post_id
  tag_id
}

I started with the simple example of T(U(i)) which means all tags of all users that have an item i. I was able to do it with the following SQL:

/* get all tags from the users found */
SELECT t.*, vt.* FROM verse_tags as vt
LEFT JOIN tags as t ON t.id = vt.tag_id
WHERE user_id in 
(
    /* Get all user_ids that have taged this item */
    SELECT user_id FROM verse_tags WHERE verse_id = 26046 GROUP BY user_id
)
GROUP BY t.id

Then I started with a slightly harder +1 level deep query. T(U(T(u))) which is tags of users using tags like user #.

/* Then get the tags of the user with tags like the user 3 */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE user_id in 
(
    /* Then get users with these tags */
    SELECT pt.user_id FROM post_tags as pt
    LEFT JOIN tags as t on t.id = pt.tag_id
    WHERE tag_id in
    (
        /* get tags of user */
        SELECT t.id FROM post_tags as pt
        LEFT JOIN tags as t ON t.id = pt.tag_id
        WHERE pt.user_id = 3
        GROUP BY t.id
    )
    GROUP BY user_id
)
GROUP BY t.id

However, it since I normally use JOIN's in my queries I am not sure how something like this could be optimized or what design flaws need to be avoided when using subqueries. I have even read that JOIN's should be used instead, but I have no idea how this would be accomplished with the above queries.

How could these queries be optimized?

UPDATE

1) Replaced GROUP BY with SELECT DISTINCT. (.74 sec)

2) Replace WHERE in with WHERE exists. (.40 sec)

3) Added indexes (oops!) (0.09 sec)

4) Back to WHERE in (0.08 sec)

EXPLAIN SELECT DISTINCT tag_id FROM post_tags WHERE user_id in
(
    SELECT DISTINCT user_id FROM post_tags WHERE tag_id in
    (
        SELECT DISTINCT tag_id FROM post_tags WHERE user_id = 3
    )
)

Running EXPLAIN gives me these results:

id  select_type     table       type        possible_keys   key key_len ref rows    Extra
1   PRIMARY         post_tags   index       NULL        tag_id  4   NULL    14  Using where
2   DEPENDENT SUBQUERY  post_tags   index_subquery  user_id     user_id 4   func    1   Using where
3   DEPENDENT SUBQUERY  post_tags   index_subquery  user_id,tag_id  tag_id  4   func    1   Using where