views:

617

answers:

3

I've got two tables in SQL, one with a project and one with categories that projects belong to, i.e. the JOIN would look roughly like:

Project | Category
--------+---------
  Foo   | Apple
  Foo   | Banana
  Foo   | Carrot
  Bar   | Apple
  Bar   | Carrot
  Qux   | Apple
  Qux   | Banana

(Strings replaced with IDs from a higher normal form, obviously, but you get the point here.)

What I want to do is allow filtering such that users can select any number of categories and results will be filtered to items that are members of all the selected categories. For example, if a user selects categories "Apple" and "Banana", projects "Foo" and "Qux" show up. If a user select categories "Apple", "Banana", and "Carrot" then only the "Foo" project shows up.

The first thing I tried was a simple SELECT DISTINCT Project FROM ... WHERE Category = 'Apple' AND Category = 'Banana', but of course that doesn't work since Apple and Banana show up in the same column in two different rows for any common project.

GROUP BY and HAVING don't do me any good, so tell me: is there an obvious way to do this that I'm missing, or is it really so complicated that I'm going to have to resort to recursive joins?

This is in PostgreSQL, by the way, but of course standard SQL code is always preferable when possible.

+3  A: 

Since a project can only be in a category once, we can use COUNT to pull this stunt off:

SELECT project, COUNT(category) AS cat_count
  FROM /* your join */
  WHERE category IN ('apple', 'banana')
  GROUP BY project
  HAVING cat_count = 2

A project with a category of only apple or banana will get a count of 1, and thus fail the HAVING clause. Only a project with both categories will get a count of 2.

If for some reason you have duplicate categories, you can use something like COUNT(DISTINCT category). COUNT(*) should work as well, and differs only if category can be null.

derobert
The OP wants where a project has BOTH/all categories, not either. Also, having count would be valid if a project has two instances of 'Apple'.
OMG Ponies
That's what the having clause accomplishes, and the note about distinct covers your other issue.
krdluzni
@krdluzni: ONLY if you can guarantee that a category and project combination is unique.
OMG Ponies
In this particular case it is (as guaranteed by a multiple key on the table) although it's possible that future expansion will need similar solutions without having the luxury of unique entries. I do like this one, though.
Chris
+4  A: 
Quassnoi
Sometimes SQL makes my brain hurt, and this is one of those times--the code works great, and it's very efficient, but I wouldn't have ever thought of doing two "Select where not exists" queries. Thanks!
Chris
`@Chris`: That's formulating a procedural approach in set-based terms. If you look closer into this query, it's doing just the same as a programmer would do: "for each project, take each category and check that it exists in this project", but without cursors or loops. It's a good illustration to this article I wrote some time ago: http://explainextended.com/2009/07/12/double-thinking-in-sql/
Quassnoi
A: 

One other solution is, of course, something like "SELECT DISTINCT Project FROM ... AS a WHERE 'Apple' IN (SELECT Category FROM ... AS b WHERE a.Project = b.Project) AND 'Banana' IN (SELECT Category FROM ... AS b WHERE a.Project = b.Project)", but that gets pretty computationally expensive pretty quickly. I was hoping for something more elegant, and you guys haven't disappointed. I'm including this one mostly for completeness in case someone else consults this question. It's clearly worth zero points. :)

Chris