I was reading the great tagging article by Nitin Borwankar and he started me thinking of the ways to implement differnet levels of searches using two tables.
tags {
id,
tag
}
post_tags {
id
user_id
post_id
tag_id
}
I started with the simple example of T(U(i))
which means all tags of all users that have an item i. I was able to do it with the following SQL:
/* get all tags from the users found */
SELECT t.*, vt.* FROM verse_tags as vt
LEFT JOIN tags as t ON t.id = vt.tag_id
WHERE user_id in
(
/* Get all user_ids that have taged this item */
SELECT user_id FROM verse_tags WHERE verse_id = 26046 GROUP BY user_id
)
GROUP BY t.id
Then I started with a slightly harder +1 level deep query. T(U(T(u)))
which is tags of users using tags like user #.
/* Then get the tags of the user with tags like the user 3 */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE user_id in
(
/* Then get users with these tags */
SELECT pt.user_id FROM post_tags as pt
LEFT JOIN tags as t on t.id = pt.tag_id
WHERE tag_id in
(
/* get tags of user */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE pt.user_id = 3
GROUP BY t.id
)
GROUP BY user_id
)
GROUP BY t.id
However, it since I normally use JOIN's in my queries I am not sure how something like this could be optimized or what design flaws need to be avoided when using subqueries. I have even read that JOIN's should be used instead, but I have no idea how this would be accomplished with the above queries.
How could these queries be optimized?
UPDATE
1) Replaced GROUP BY
with SELECT DISTINCT
. (.74 sec)
2) Replace WHERE in
with WHERE exists
. (.40 sec)
3) Added indexes (oops!) (0.09 sec)
4) Back to WHERE in
(0.08 sec)
EXPLAIN SELECT DISTINCT tag_id FROM post_tags WHERE user_id in
(
SELECT DISTINCT user_id FROM post_tags WHERE tag_id in
(
SELECT DISTINCT tag_id FROM post_tags WHERE user_id = 3
)
)
Running EXPLAIN gives me these results:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY post_tags index NULL tag_id 4 NULL 14 Using where
2 DEPENDENT SUBQUERY post_tags index_subquery user_id user_id 4 func 1 Using where
3 DEPENDENT SUBQUERY post_tags index_subquery user_id,tag_id tag_id 4 func 1 Using where