tags:

views:

32

answers:

2

Hi everyone,

I have a table articles, another tags, and a third called article_tags. I want to produce a page which lists all of the articles for a specific tag.

My query looks like this:

SELECT headline, GROUP_CONCAT(tags.tag_name) AS all_tags FROM articles
LEFT JOIN articles_tags ON articles.article_id = articles_tags.article_id
LEFT JOIN tags ON articles_tags.tag_id = tags.tag_id
WHERE tags.tag_name = 'japan'
GROUP BY articles.article_id

All of the returned articles only have japan as a tag, even when the article in question has several tags.

This is obviously related to the WHERE clause, but I can't figure out how to do what I want here - ideally I'd end up with a list like japan,china,korea instead. Is this the place for a subquery? Could do with a SQL guru to advise.

Thanks, Matt

+2  A: 

There are at least two approaches you could use. One approach is to join with the tables twice. The other as you point out is to use a subquery. For simiplicity and ease of reading, I'd probably go with the subquery here. The resulting query would look something like this:

SELECT
    headline,
    GROUP_CONCAT(tags.tag_name) AS all_tags
FROM articles
JOIN articles_tags ON articles.article_id = articles_tags.article_id
JOIN tags ON articles_tags.tag_id = tags.tag_id
WHERE articles.article_id IN (
    SELECT articles.article_id
    FROM articles
    JOIN articles_tags ON articles.article_id = articles_tags.article_id
    JOIN tags ON articles_tags.tag_id = tags.tag_id
    WHERE tags.tag_name = 'japan'
)
GROUP BY articles.article_id

And here's the approach using more JOINs:

SELECT
    a.headline,
    GROUP_CONCAT(t2.tag_name) AS all_tags
FROM articles a
JOIN articles_tags at1 ON a.article_id = at1.article_id
JOIN tags t1 ON at1.tag_id = t1.tag_id AND t1.tag_name = 'tag1'
JOIN articles_tags at2 ON a.article_id = at2.article_id
JOIN tags t2 ON at2.tag_id = t2.tag_id
GROUP BY a.article_id;
Mark Byers
This works, thank you - but it took a minute to run! Hmm. Might need some better indexing...
Matt Andrews
@Matt Andrews: Yes this definitely needs indexing otherwise the subquery will be run multiple times. You can also try the approach using joins.
Mark Byers
@Matt Andrews: I've updated my answer to include an approach using joins.
Mark Byers
@Matt Andrews: Interested to see which of the three is the most efficient...
OMG Ponies
Thanks guys - here's some rough benchmarks:First query by @Mark Byers (subquery) - ~50 seconds.Second query by @OMG Ponies (EXISTS) ~ 0.5 seconds.Third query by @Mark Byers (joins) - ~0.009 seconds. Ding ding, we have a winner! Thanks a ton to you both for the suggestions, you've been really helpful :)
Matt Andrews
@Matt Andrews: Thx for comparing, +1 to Mark
OMG Ponies
+2  A: 

Using EXISTS:

   SELECT a.headline,
          GROUP_CONCAT(t.tag_name) AS all_tags
     FROM ARTICLES a
LEFT JOIN ARTICLES_TAGS at ON at.article_id = a.article_id
LEFT JOIN TAGS t ON t.tag_id = at.tag_id
    WHERE EXISTS(SELECT NULL
                   FROM ARTICLES x
                   JOIN ARTICLE_TAGS y ON y.article_id = x.article_id
                   JOIN TAGS z ON z.tag_id = y.tag_id
                              AND z.tag_name = 'japan'
                  WHERE x.article_id = a.article_id)
 GROUP BY a.article_id

No need to use LEFT JOINs in the subquery if you're only interested in the ones associated to the "japan" tag.

OMG Ponies
I get "Unknown column 'x.tag_id' in 'on clause'" with this query.
Mark Byers
@Mark Byers: Thx, corrected.
OMG Ponies
This is much quicker (0.2 sec), even after updating a few foreign keys and indexing. Just going to try Mark's other suggestion though.
Matt Andrews