tags:

views:

184

answers:

4

I have a database that stores products "available on the market" and products "still in development" in two separate tables (market_product and dev_product). A third table (substance) contains all substances a product can made of. Other two tables (marked_product_comp and dev_product_comp) mantains product compositions.

I want to select products still in development that are made of the same ingredients of marketed products.

In the following (simplified) example the query must select product with ID = 2 from dev_product table.

CREATE table market_product (ID SERIAL PRIMARY KEY);
CREATE table dev_product (ID SERIAL PRIMARY KEY);
CREATE table substance (ID SERIAL PRIMARY KEY);
CREATE table market_product_comp (prodID SERIAL, substID SERIAL, PRIMARY KEY(prodID,substID));
CREATE table dev_product_comp (devID SERIAL, substID SERIAL, PRIMARY KEY(devID,substID));

INSERT INTO market_product VALUES (1),(2);
INSERT INTO dev_product VALUES (1),(2);
INSERT INTO substance VALUES (1),(2),(3);
INSERT INTO market_product_comp VALUES (1,1),(1,2),(2,3);
INSERT INTO dev_product_comp VALUES (1,2),(2,1),(2,2);

How to write such query?


UPDATE:

Sorry, I haven't noticed I asked my question in an ambiguous way.

I want to select products still in development that have the same composition of at least one marketed product. For example, if there is a dev_product made by substances {1,2} and only one market_product made by substances {1,2,3}, I want to discard that dev_product, because it has a different composition. I hope this clarify.

A: 
select d.* from dev_product d
 left join dev_product_comp dpc on d.Id = dpc.devId
where dpc.substID in 
  (select mpc.substID from market_product_comp  mpc 
    left join market_product mp on mp.Id = mpc.prodId)
Gregoire
A: 

Select only dev product ids where all the products substances are used in market products.

select 
   dp.id
from 
   dev_product dp
   inner join dev_product_comp dpc on dp.id = dpc.devid
where 
   dpc.substid in (select substid from market_product_comp) 
group by 
   dp.id
having 
   count() = (select count() from dev_product_comp where devid = dp.id)

Excludes products with ANY ingredients not used in production.

Joel Potter
A: 

In MySQL:

SELECT  *
FROM    dev_product dp
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    market_product mp
        WHERE   NOT EXISTS
                (
                SELECT  NULL
                FROM    dev_product_comp dpc
                WHERE   dpc.prodID = dp.id
                        AND NOT EXISTS
                        (
                        SELECT  NULL
                        FROM    market_product_comp mpc
                        WHERE   mpc.prodID = mp.id
                                AND mpc.substID = dpc.substID
                        )
                )
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    market_product_comp mpc
                WHERE   mpc.prodID = mp.id
                        AND NOT EXISTS
                        (
                        SELECT  NULL
                        FROM    dev_product_comp dpc
                        WHERE   dpc.prodID = dp.id
                                AND dpc.substID = mpc.substID
                        )
                )

        )

In PostgreSQL:

SELECT  *
FROM    dev_product dp
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    market_product mp
        WHERE   NOT EXISTS
         (
         SELECT  NULL
         FROM    (
          SELECT substID
          FROM market_product_comp mpc
          WHERE mpc.prodID = mp.ID
          ) m
         FULL OUTER JOIN
          (
          SELECT substID
          FROM dev_product_comp dpc
          WHERE dpc.devID = dp.ID
          ) d
         ON d.substID = m.substID
         WHERE d.substID IS NULL OR m.substID IS NULL
         )
        )

Neither from these queries uses COUNT(*): it's enough to find but a single non-matching component to stop evaluating the whole pair.

See these entries in my blog for explanations:

Quassnoi
+1  A: 

Here's a solution that relies on the fact that COUNT() ignores NULLs.

SELECT d1.devId, m1.prodId
FROM market_product_comp m1
CROSS JOIN dev_product_comp d1
LEFT OUTER JOIN dev_product_comp d2 
   ON (d2.substId = m1.substId AND d1.devId = d2.devId)
LEFT OUTER JOIN market_product_comp m2 
   ON (d1.substId = m2.substId AND m1.prodId = m2.prodId)
GROUP BY d1.devId, m1.prodId
HAVING COUNT(d1.substId) = COUNT(d2.substId)
   AND COUNT(m1.substId) = COUNT(m2.substId);

I tested this on MySQL 5.0.75, but it's all ANSI standard SQL so it should work on any brand of SQL database.

Bill Karwin
Bill, I like your solution too, especially because it is more informative(in the output I can see the matching products). I vote it up.But I can choose only one answer, and I choose the Quassnoi one, because his solution seems to run faster than yours, and it's more easy to understand (at least to me).
Hobbes
Thanks. No problem, I know the importance of understanding your code so it's maintainable! :-)
Bill Karwin