tags:

views:

734

answers:

5

Suppose I have a "tags" table with two columns: tagid and contentid. Each row represents a tag assigned to a piece of content. I want a query that will give me the contentid of every piece of content which is tagged with tagids 334, 338, and 342.

The "easy" way to do this would be (pseudocode):

select contentid from tags where tagid = 334 and contentid in (
    select contentid from tags where tagid = 338 and contentid in (
        select contentid from tags where tagid = 342
    )
)

However, my gut tells me that there's a better, faster, more extensible way to do this. For example, what if I needed to find the intersection of 12 tags? This could quickly get horrendous. Any ideas?

EDIT: Turns out that this is also covered in this excellent blog post.

+1  A: 

The only alternative way i can think of is:

select a.contentid from tags a
inner join tags b on a.contentid = b.contentid and b.tagid=334
inner join tags c on a.contentid = c.contentid and c.tagid=342
where a.tagid=338
AlbertEin
+19  A: 
SELECT contentID
FROM tags
WHERE tagID in (334, 338, 342)
GROUP BY contentID
HAVING COUNT(DISTINCT tagID) = 3


--In general
SELECT contentID
FROM tags
WHERE tagID in (...) --taglist
GROUP BY contentID
HAVING COUNT(DISTINCT tagID) = ... --tagcount
David B
That's it. Although if tagid, contentid is unique, then perhaps DISTINCT is not needed in COUNT(DISTINCT tagID)
ΤΖΩΤΖΙΟΥ
Very smart solution but it would need a tweak on our installation of SQL2000. We can't use "having" without having an aggregation in the select clause. Easy enough to work around though.
TrickyNixon
Would this work in MySQL as well as Oracle?
Max
I only know t-sql Max, maybe someone else knows about other sql flavors?
David B
I haven't tested this in other DBs, but all of the commands used are part of the SQL standard, so they *should* work.
R. Bemrose
A: 

What type of SQL? MS SQL Server, Oracle, MySQL?

In SQL Server doesn't this equate to:

select contentid from tags where tagid IN (334,338,342)
Meff
No, that would give you all the articles in the 3 tags, he wants all the articles which have the same 3 tags
AlbertEin
Yes, you're dead right, David B nailed it.
Meff
A: 

I don't know if this is better but it might be more maintainable

select contentid from tags where tagid = 334
intersect
select contentid from tags where tagid = 338
intersect
select contentid from tags where tagid = 342

You'd have to build it dynamically which wouldn't be as bad as your original solution.

TrickyNixon
A: 

Here's a solution that has worked much faster than the for me on a very large database of objects and tags. This is an example for a three-tag intersection. It just chains many joins on the object-tag table (objtags) to indicate the same object and stipulates the tag IDs in the WHERE clause:

SELECT w0.objid

FROM       objtags t0
INNER JOIN objtags t1 ON t1.objid=t0.objid
INNER JOIN objtags t2 ON t2.objid=t1.objid

WHERE t0.tagid=512
  AND t1.tagid=256
  AND t2.tagid=128

I have no idea why this runs faster. It was inspired by the search code in the MusicBrainz server. Doing this in Postgres, I usually get a ~8-10x speedup over the HAVING COUNT(...) solution.

adrian
You're using set intersection instead of aggregation to determine whether all three values apply. Less thrashing. That's gorgeous, I wish I'd thought of it.
Peter Wone