In addition to this question http://stackoverflow.com/questions/1202668/problem-with-sql-query which had very neat solution, I was wondering how the next step would look:
DOCUMENT_ID | TAG
----------------------------
1 | tag1
1 | tag2
1 | tag3
2 | tag2
3 | tag1
3 | tag2
4 | tag1
5 | tag3
So, to get all the document_ids that have tag 1 and 2 we would perform a query like this:
SELECT document_id
FROM table
WHERE tag = 'tag1' OR tag = 'tag2'
GROUP BY document_id
HAVING COUNT(DISTINCT tag) = 2
Now, what would be interesting to know is how we would get all the distinct document_ids that have tags 1 and 2, and in addition to that the ids that have tag 3. We could imagine making the same query and performing a union between them:
SELECT document_id
FROM table
WHERE tag = "tag1" OR tag = "tag2"
GROUP BY document_id
HAVING COUNT(DISTINCT tag) = 2
UNION
SELECT document_id
FROM table
WHERE tag = "tag3"
GROUP BY document_id
But I was wondering if with that condition added, we could think of another initial query. I am imagining having many "unions" like that with different tags and tag counts. Wouldn't it be very bad in terms of performance to create chains of unions like that?