views:

168

answers:

5

Hi,

i have the following data in my table called cat_product

cat_id product_id
1          2
1          3
2          2
2          4

If given a set of values for product_id (2,3) i want to know the unique cat_id. In this case, that will be cat_id 1.

how should i construct mysql query?

i tried to use

select distinct cat_id from cat_product where product_id IN (2,3) 

but it returns both 1 and 2.

if i use

select distinct cat_id from cat_product where product_id NOT IN (2,3) 

i get 2.

is there a better way than

   select distinct cat_id from cat_product where product_id IN (2,3) 
and cat_id NOT IN 
(select distinct cat_id from cat_product where product_id NOT IN (2,3) )

i need to return the category_id that has the EXACT set of product id i am looking for.

basically i have about 10 product ids as input.

A: 

Your own solution would not work (at least not if I understand the question correctly). It will return the id of a category that contains one or more of the listed products and no other products. Try adding the following row to your table, and see if you get the expected result:

insert into cat_product (cat_id, product_id) values (1,5)

If you really need to find the id of a category that has all of the listed products (no matter what other products might be in the category), try this query:

select cat_id
from cat_product
where product_id in (2,3)
group by cat_id
having count(*) = 2

The number 2 on the last line of the query corresponds to the size of the set of products you are searching for. If you are executing the query using some parameterized API, make sure to create an additional parameter bound to productsArray.length or similar.

Jørn Schou-Rode
i need to return the category_id that has the EXACT set of product id i am looking for.
As in **exact** and **nothing else**?
Jørn Schou-Rode
yes exact and nothing else. the convoluted query i put works for me. i need to ask if there is a better AND faster way. and all categories have 10 AND exactly 10 products associated with each category.
I don't believe your current solution gives the correct result - see the first four lines of my answer.
Jørn Schou-Rode
A: 
SELECT * FROM
  (SELECT cat_id FROM cat_product WHERE product_id=2) a
INNER JOIN
  (SELECT cat_id FROM cat_product WHERE product_id=3) b
ON a.cat_id = b.cat_id;
Amber
what if i have about 10 values for the product_id? basically (2, 3, 5, 7, 8, 11, 12, 13, 14, 20)?
+1  A: 
SELECT  cat_id
FROM    (
        SELECT  DISTINCT cat_id
        FROM    cat_product
        ) cpo
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    cat_product cpi
        WHERE   cpi.cat_id = cpo.cat_id
                AND product_id IN (2, 3)
        LIMIT 1, 1
        )

You need to have a UNIQUE index on (cat_id, product_id) (in this order) for this to work fast.

This solution will use INDEX FOR GROUP BY to get a list of distinct categories, and EXISTS predicate will be a little bit faster than COUNT(*) (since the aggregation requires some overhead).

If you have more than two products to search for, adjust the first argument to LIMIT accordingly.

It should be LIMIT n - 1, 1, where n is the number of items in the IN list.

Update:

To return the categories holding all products from the list and nothing else, use this:

SELECT  cat_id
FROM    (
        SELECT  DISTINCT cat_id
        FROM    cat_product
        ) cpo
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    cat_product cpi
        WHERE   cpi.cat_id = cpo.cat_id
                AND product_id IN (2, 3)
        LIMIT 1, 1
        )
        AND NOT EXISTS
        (
        SELECT  NULL
        FROM    cat_product cpi
        WHERE   cpi.cat_id = cpo.cat_id
                AND product_id NOT IN (2, 3)
        )
Quassnoi
AND product_id IN (2, 3) is there an error here? is it cpi.product_id or cpo.product_id
`cpo` does not return `product_id`, so it's `cpi.product_id`. There is no error, but you can prepend the alias if you want.
Quassnoi
This query will return any category that has **all** of the listed products and **possibly other** products as well. According to the OP's comments on my answer, this is not the desired behavior.
Jørn Schou-Rode
A: 

Find the cat_id's that contains all the given products and nothing else.

Let's create some test data...

create table cat_product (
    cat_id int not null,
    product_id int not null,
    primary key (cat_id, product_id));

delete from cat_product;
insert into cat_product
values
    (1, 2),
    (1, 3),
    (2, 2),
    (2, 4),
    (3, 1),
    (3, 2),
    (3, 3),
    (3, 4),
    (3, 5),
    (4, 1),
    (4, 2),
    (4, 3),
    (4, 4),
    (4, 6),
    (5, 1),
    (5, 2),
    (5, 3),
    (5, 4),
    (5, 5),
    (5, 6);

Plug in your list of product_id's for cp.product_id IN (1, 2, 3, 4, 5)
AND plug in the number of product_id's on the last line of the query at cats.match_count = 5.

select
    cats.cat_id
from
    /* Count how many products for each cat match the list of products */
    (select
        cp.cat_id,
        count(*) as match_count
    from
        cat_product cp
    where
        cp.product_id IN (1, 2, 3, 4, 5)
    group by
        cp.cat_id) as cats,
    /* Count the number of products in each cat */
    (select cat_id, count(*) as cat_count
    from cat_product
    group by cat_id) as cat_count
where
    cats.cat_id = cat_count.cat_id
    /* We matched all the products in the cat */
    and cats.match_count = cat_count.cat_count
    /* We matched all the products we wanted.  Without this clause 
       the query would also match cats that only contain products in 
       our list (but not all of them) */
    and cats.match_count = 5;
cope360
A: 

Relational Division is what you are looking for

Virat Kadaru