ansaurus

Question

SQL: How do I make a selection based on categories?

Answer 1

+3 A:

You actually get more than one entry per book. If n of n categories are assigned to the book, you get n entries for the book. So you could group your query and select only those who have n hits:

SELECT T.cat_id, count(*) hits FROM
(
    SELECT * FROM categories WHERE cat_id IN(1,3)
) T
GROUP BY T.cat_id
HAVING hits = 2

chiccodoro 2010-06-23 14:11:51

But then it still returns all the books containing at least one of the given categories, while that is not wanted behaviour.

EarthMind 2010-06-23 14:13:35

That's a good idea.

Pointy 2010-06-23 14:14:12

@Earthmind you'd add a "having" clause at the end and only accept rows where the count is 2 (or whatever; the number of categories in your "IN" list).

Pointy 2010-06-23 14:14:58

@Pointy: you're right, just replaced the "where" by "having".

chiccodoro 2010-06-23 14:15:53

select book.id, book.name, count(category.id) from book join category on book.id = category.id where category in (1, 3) group by book.id, book.name having count(category.id) = 2

Pointy 2010-06-23 14:17:14

cool, post that as an answer instead

chiccodoro 2010-06-23 14:17:57

@chiccodoro, he wants the book that is on the three categories, so you r answer does not answer the question, because returns only a book on category 1 and 3

pcent 2010-06-23 14:31:14

That `HAVING` could just be `WHERE`, a virtual table is not an aggregate.

Evan Carroll 2010-06-23 14:47:09

I see, I didn't realize I did it in a separate select statement. Of course that can be merged. I've edited my answer once more...

chiccodoro 2010-06-24 06:22:42

@pcent: You're wrong, see the accepted answer

chiccodoro 2010-06-24 06:24:08

Answer 2

A:

Try this:

SELECT * FROM books WHERE id IN 
(SELECT book_id
FROM categories
GROUP BY book_id 
HAVING COUNT(distinct cat_id)  = (select count(distinct cat_id) from categories))

Edited: I edited the query so it returns the books containing ALL categories as stated in the question

pcent 2010-06-23 14:16:43

here you don't check what categories a book is assigned to

chiccodoro 2010-06-23 14:18:26

this query returns books that are in two categories

pcent 2010-06-23 14:20:25

right, so it does not answer the question

chiccodoro 2010-06-23 14:24:40

@chiccodoro, I edited it to 3, so it do answer the question, try it yourself. The book on three categories will be returned. This is what he asks for...

pcent 2010-06-23 14:26:33

I guess you misunderstood the question. The book should not be in *all* existing categories, but in all of the provided set, in the example it was (1,3)

chiccodoro 2010-06-24 06:21:52

Answer 3

A:

Join against each category that you require:

SELECT books.*
FROM books
     JOIN categories cat1 ON cat1.book_id = books.book_id
     JOIN categories cat3 ON cat3.book_id = books.book_id
WHERE cat1.cat_id = 1
      AND cat3.cat_id = 3

Or you do this equivalently using WHERE EXISTS (semi join) if you don't like to add inner joins.

araqnid 2010-06-23 14:18:58

Answer 4

+1 A:

Yet another alternative method:

SELECT book_id FROM categories WHERE cat_id = 1 
INTERSECT 
SELECT book_id FROM categories WHERE cat_id = 3;

You can continue to chain INTERSECTs if you have more than two categories to match.

Matthew Wood 2010-06-23 14:21:24

That would require a separate select for each category, and would grow the complexity in code and operation for each added category. You're visting indexes or scanning tables needlessly.

Evan Carroll 2010-06-23 15:30:13

Not sure if this warrants a down-vote. My intent on adding this example is to show that this is really a set-based problem and that there is a "proper" set-based solution that PostgreSQL supports. I made no claims about the performance or ease of use from an application code point-of-view. Indeed, I also use the HAVING SUM(CASE...) method in my own application code, but for ad-hoc requests, I find INTERSECT to be easier both to read and to write.

Matthew Wood 2010-06-24 14:55:17

Answer 5

+2 A:

Try:

select book_id
from categories
group by book_id
having sum( ( cat_id in (1,3) )::int ) = 2

Or if you intend to pass an array to postgres from language that supports passing array directly to it(like this: http://fxjr.blogspot.com/2009/05/npgsql-tips-using-in-queries-with.html), use this:

select book_id
from categories
group by book_id
having sum( ( cat_id = ANY(ARRAY[1,3]) )::int ) = 2

If you want to get the book name:

select categories.book_id, books.name
from categories
join books on books.id = categories.book_id
group by categories.book_id
    ,books.name
having sum( ( categories.cat_id in (1,3) )::int ) = 2

@Evan Carroll, amending the query:

ANSI SQL way:

select categories.book_id, books.name
from categories
join books on books.id = categories.book_id
group by categories.book_id
    ,books.name
having count(case when categories.cat_id in (1,3) then 1 end) = 2

Sans the book name:

select book_id
from categories
group by book_id
having count( case when cat_id in (1,3) then 1 end ) = 2

What's the advantage of inlining the condition and its count value in the same clause(i.e. having) as opposed to separately putting the condition in where clause and its count in having clause?...

select book_id
from categories
where category_id in (1,3)
group by book_id
having count(*) = 2

...If we inline both the condition and its count value in having clause, we can facilitate an inquiry of let's say list all books with categories of 1 and 3, or with categories of 2 and 3 and 4. Future-proofing FTW! Plus the testing of combined categories and its count are next to each other, plus factor in terms of readability.

To facilitate that kind of query:

select book_id
from categories
group by book_id
having 
    count( case when cat_id in (1,3) then 1 end ) = 2 
    or count( case when cat_id in (2,3,4) then 1 end ) = 3

To achieve performance(sometimes, achieving both performance and readability; don't mix well), must duplicate the testing of elements of having clause to where clause:

select book_id
from categories
where cat_id in (1,2,3,4)
group by book_id
having 
    count( case when cat_id in (1,3) then 1 end ) = 2 
    or count( case when cat_id in (2,3,4) then 1 end ) = 3

[EDIT]

BTW, here's the idiomatic MySQL:

select book_id
from categories
group by book_id
having sum( cat_id in (1,3) ) = 2

Michael Buen 2010-06-23 14:36:04

this seems rather awkward, and wrong. sum is made for adding arguments, `count()` is made for counting rows. See my answer for a much easier way to do this.

Evan Carroll 2010-06-23 14:52:58

before you say it's wrong, it's an idiomatic postgres. if i'm using mysql, i will do this: `sum( categories.cat_id in (1,3) )`, for the fact that in mysql, boolean and integer are the same, they are just 1 and 0 behind the scenes, so no more casting needed. for postgresql, we just need to cast boolean to integer so things will work as intended. ok.., for you i will make it ANSI SQL-compliant. editing coming up

Michael Buen 2010-06-23 15:04:23

Answer 6

A:

SELECT * FROM 
(
 SELECT b.id, count(c.cat_id) as cat_count
 FROM books AS b
 JOIN cats AS c
   ON ( b.id = c.book_id )
 GROUP BY b.id
) AS t
WHERE t.cat_count = ( SELECT DISTINCT count(cat_id) FROM cat );

This assumes one book can't be in the same category twice. This selects all books in either category, counts the categories, and makes sure the category count is the max number of categories.

Evan Carroll 2010-06-23 14:51:16

ansaurus

tags:

views:

answers:

SQL: How do I make a selection based on categories?

related questions