tags:

views:

101

answers:

6

There are two tables, categories and books and I'd like to select all books based on the given categories.

Categories table:

cat_id | book_id
----------------
1      | 1
2      | 1
3      | 1
3      | 2

Books table:

id  | name
----------------
1   | abc
2   | def

I've tried SELECT * FROM categories WHERE cat_id IN(1,3) but then it returns books containing at least one of the given categories. What I'd like is that it only returns the books containing ALL the categories, so it should only return all (or one) rows where book_id = 1 since it's the only book with all the given categories.

+3  A: 

You actually get more than one entry per book. If n of n categories are assigned to the book, you get n entries for the book. So you could group your query and select only those who have n hits:

SELECT T.cat_id, count(*) hits FROM
(
    SELECT * FROM categories WHERE cat_id IN(1,3)
) T
GROUP BY T.cat_id
HAVING hits = 2
chiccodoro
But then it still returns all the books containing at least one of the given categories, while that is not wanted behaviour.
EarthMind
That's a good idea.
Pointy
@Earthmind you'd add a "having" clause at the end and only accept rows where the count is 2 (or whatever; the number of categories in your "IN" list).
Pointy
@Pointy: you're right, just replaced the "where" by "having".
chiccodoro
select book.id, book.name, count(category.id) from book join category on book.id = category.id where category in (1, 3) group by book.id, book.name having count(category.id) = 2
Pointy
cool, post that as an answer instead
chiccodoro
@chiccodoro, he wants the book that is on the three categories, so you r answer does not answer the question, because returns only a book on category 1 and 3
pcent
That `HAVING` could just be `WHERE`, a virtual table is not an aggregate.
Evan Carroll
I see, I didn't realize I did it in a separate select statement. Of course that can be merged. I've edited my answer once more...
chiccodoro
@pcent: You're wrong, see the accepted answer
chiccodoro
A: 

Try this:

SELECT * FROM books WHERE id IN 
(SELECT book_id
FROM categories
GROUP BY book_id 
HAVING COUNT(distinct cat_id)  = (select count(distinct cat_id) from categories))

Edited: I edited the query so it returns the books containing ALL categories as stated in the question

pcent
here you don't check what categories a book is assigned to
chiccodoro
this query returns books that are in two categories
pcent
right, so it does not answer the question
chiccodoro
@chiccodoro, I edited it to 3, so it do answer the question, try it yourself. The book on three categories will be returned. This is what he asks for...
pcent
I guess you misunderstood the question. The book should not be in *all* existing categories, but in all of the provided set, in the example it was (1,3)
chiccodoro
A: 

Join against each category that you require:

SELECT books.*
FROM books
     JOIN categories cat1 ON cat1.book_id = books.book_id
     JOIN categories cat3 ON cat3.book_id = books.book_id
WHERE cat1.cat_id = 1
      AND cat3.cat_id = 3

Or you do this equivalently using WHERE EXISTS (semi join) if you don't like to add inner joins.

araqnid
+1  A: 

Yet another alternative method:

SELECT book_id FROM categories WHERE cat_id = 1 
INTERSECT 
SELECT book_id FROM categories WHERE cat_id = 3;

You can continue to chain INTERSECTs if you have more than two categories to match.

Matthew Wood
That would require a separate select for each category, and would grow the complexity in code and operation for each added category. You're visting indexes or scanning tables needlessly.
Evan Carroll
Not sure if this warrants a down-vote. My intent on adding this example is to show that this is really a set-based problem and that there is a "proper" set-based solution that PostgreSQL supports. I made no claims about the performance or ease of use from an application code point-of-view. Indeed, I also use the HAVING SUM(CASE...) method in my own application code, but for ad-hoc requests, I find INTERSECT to be easier both to read and to write.
Matthew Wood
+2  A: 

Try:

select book_id
from categories
group by book_id
having sum( ( cat_id in (1,3) )::int ) = 2

Or if you intend to pass an array to postgres from language that supports passing array directly to it(like this: http://fxjr.blogspot.com/2009/05/npgsql-tips-using-in-queries-with.html), use this:

select book_id
from categories
group by book_id
having sum( ( cat_id = ANY(ARRAY[1,3]) )::int ) = 2

If you want to get the book name:

select categories.book_id, books.name
from categories
join books on books.id = categories.book_id
group by categories.book_id
    ,books.name
having sum( ( categories.cat_id in (1,3) )::int ) = 2

@Evan Carroll, amending the query:

ANSI SQL way:

select categories.book_id, books.name
from categories
join books on books.id = categories.book_id
group by categories.book_id
    ,books.name
having count(case when categories.cat_id in (1,3) then 1 end) = 2

Sans the book name:

select book_id
from categories
group by book_id
having count( case when cat_id in (1,3) then 1 end ) = 2

What's the advantage of inlining the condition and its count value in the same clause(i.e. having) as opposed to separately putting the condition in where clause and its count in having clause?...

select book_id
from categories
where category_id in (1,3)
group by book_id
having count(*) = 2

...If we inline both the condition and its count value in having clause, we can facilitate an inquiry of let's say list all books with categories of 1 and 3, or with categories of 2 and 3 and 4. Future-proofing FTW! Plus the testing of combined categories and its count are next to each other, plus factor in terms of readability.

To facilitate that kind of query:

select book_id
from categories
group by book_id
having 
    count( case when cat_id in (1,3) then 1 end ) = 2 
    or count( case when cat_id in (2,3,4) then 1 end ) = 3

To achieve performance(sometimes, achieving both performance and readability; don't mix well), must duplicate the testing of elements of having clause to where clause:

select book_id
from categories
where cat_id in (1,2,3,4)
group by book_id
having 
    count( case when cat_id in (1,3) then 1 end ) = 2 
    or count( case when cat_id in (2,3,4) then 1 end ) = 3

[EDIT]

BTW, here's the idiomatic MySQL:

select book_id
from categories
group by book_id
having sum( cat_id in (1,3) ) = 2
Michael Buen
this seems rather awkward, and wrong. sum is made for adding arguments, `count()` is made for counting rows. See my answer for a much easier way to do this.
Evan Carroll
before you say it's wrong, it's an idiomatic postgres. if i'm using mysql, i will do this: `sum( categories.cat_id in (1,3) )`, for the fact that in mysql, boolean and integer are the same, they are just 1 and 0 behind the scenes, so no more casting needed. for postgresql, we just need to cast boolean to integer so things will work as intended. ok.., for you i will make it ANSI SQL-compliant. editing coming up
Michael Buen
A: 
SELECT * FROM 
(
 SELECT b.id, count(c.cat_id) as cat_count
 FROM books AS b
 JOIN cats AS c
   ON ( b.id = c.book_id )
 GROUP BY b.id
) AS t
WHERE t.cat_count = ( SELECT DISTINCT count(cat_id) FROM cat );

This assumes one book can't be in the same category twice. This selects all books in either category, counts the categories, and makes sure the category count is the max number of categories.

Evan Carroll