views:

48

answers:

2

Let's say I have the following tables:

TAGS

id: integer
name: string

POSTS

id: integer
body: text

TAGGINGS

id: integer
tag_id: integer
post_id: integer

How would I go about writing a query that select all posts that are tagged with ALL of the following tags (name attribute of tags table): "Cheese", "Wine", "Paris", "Frace", "City", "Scenic", "Art"

See also: http://stackoverflow.com/questions/3876251/need-help-with-sql-query-involving-tagging-calculations (note: similar, but not a duplicate!)

+1  A: 

Using IN:

SELECT p.*
  FROM POSTS p
 WHERE p.id IN (SELECT tg.post_id
                  FROM TAGGINGS tg
                  JOIN TAGS t ON t.id = tg.tag_id
                 WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
              GROUP BY tg.post_id
                HAVING COUNT(DISTINCT t.name) = 7)

Using a JOIN

SELECT p.*
  FROM POSTS p
  JOIN (SELECT tg.post_id
          FROM TAGGINGS tg
          JOIN TAGS t ON t.id = tg.tag_id
         WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
      GROUP BY tg.post_id
        HAVING COUNT(DISTINCT t.name) = 7) x ON x.post_id = p.id

Using EXISTS

SELECT p.*
  FROM POSTS p
 WHERE EXISTS (SELECT NULL
                 FROM TAGGINGS tg
                 JOIN TAGS t ON t.id = tg.tag_id
                WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
                  AND tg.post_id = p.id
             GROUP BY tg.post_id
               HAVING COUNT(DISTINCT t.name) = 7)

Explanation

The crux of things is that the COUNT(DISTINCT t.name) needs to match the number of tag names to ensure that all those tags are related to the post. Without the DISTINCT, there's a risk that duplicates of one of the names could return a count of 7--so you'd have a false positive.

Performance

Most will tell you the JOIN is optimal, but JOINs also risk duplicating rows in the resultset. EXISTS would be my next choice--no duplicate risk, and generally faster execution but checking the explain plan will ultimately tell you what's best based on your setup and data.

OMG Ponies
Would this work in all sql db's? In particular, will this work in both mysql and sqlite3?
tybro0103
wow, thanks for so many ways of doing it... which one is the most robust and fastest?
tybro0103
@tybro0103: The only point of contention is the `COUNT(DISTINCT ...)`, because SQLite doesn't support it. [See this link about a workaround](http://www.bernzilla.com/item.php?id=690)
OMG Ponies
I have controlled input on the tagging system so there should be no duplicates... does that change things for sqlite3 support?
tybro0103
@tybro0103: Good, then you can safely omit the DISTINCT, and just use `HAVING COUNT( t.name) = 7)` rather than what I listed in the answer.If it's OK, I'll leave the answer as-is so others are aware of the potential for false positives.
OMG Ponies
someone else please UPVOTE THIS ANSWER... it's awesome and deserves many points!
tybro0103
A: 

Try this:

Select * From Posts p
   Where Not Exists
       (Select * From tags t
        Where name in 
           ('Cheese', 'Wine', 'Paris', 
             'Frace', 'City', 'Scenic', 'Art')
           And Not Exists
             (Select * From taggings
              Where tag_id = t.Tag_Id
                And post_Id = p.Post_Id))

Explanation: Asking for a list of those Posts that have had every one of a specified set of tags associated with it is equivilent to asking for those posts where there is no tag in that same specified set, that has not been associated with it. i.e., the sql above.

Charles Bretana
You're missing the correlation between `TAGS` and `POSTS`
OMG Ponies
could be wrong, but look again ... what about the last line ? Query says "Show me the posts where there is no tag (in the input list) which is NOT in the taggings table for that Post"
Charles Bretana
You need to have a ref to `TAGGINGS` in the `TAGS` query to relate to the POST...
OMG Ponies
Why ? The where condition in the Subquery on tags restricts it to those tags that are not in the taggings table for the specified Post. All the `Select * From Tags ...` portion does is initialize the list of the seven tags iin the list.
Charles Bretana
Again, think about it semantically "Show me the posts, where there is No Tag in the List ('Cheese', 'Wine', 'Paris', 'Frace', 'City', 'Scenic', 'Art') That is NOT represented in the Taggings table for that specific Post" it is in the second subquery that the correllation to the Posts table needs to be established, and it is in there.
Charles Bretana
@ OMG Ponies, Just to be sure, I set this up, and tried it out and it works as I surmised...
Charles Bretana