views:

204

answers:

6

Given the following tables:

Recipes
| id | name
| 1  | 'chocolate cream pie'
| 2  | 'banana cream pie'
| 3  | 'chocolate banana surprise'

Ingredients
| id | name
| 1  | 'banana'
| 2  | 'cream'
| 3  | 'chocolate'

RecipeIngredients
| recipe_id | ingredient_id
|     1     |      2
|     1     |      3
|     2     |      1
|     2     |      2
|     3     |      1
|     3     |      3

How do I construct a SQL query to find recipes where ingredients.name = 'chocolate' and ingredients.name = 'cream'?

+1  A: 

If you're searching for multiple associations then the simplest way to write the query is to use multiple EXISTS conditions instead of a single straight JOIN.

SELECT r.id, r.name
FROM Recipes r
WHERE EXISTS
(
    SELECT 1
    FROM RecipeIngredients ri
    INNER JOIN Ingredients i
        ON i.id = ri.ingredient_id
    WHERE ri.recipe_id = r.id
    AND i.name = 'chocolate'
)
AND EXISTS
(
    SELECT 1
    FROM RecipeIngredients ri
    INNER JOIN Ingredients i
        ON i.id = ri.ingredient_id
    WHERE ri.recipe_id = r.id
    AND i.name = 'cream'
)

If you know for sure that the associations are unique (i.e. a single recipe can only have a single instance of each ingredient), then you can cheat a bit using a grouping subquery with a COUNT function and possibly speed it up (performance will depend on the DBMS):

SELECT r.id, r.Name
FROM Recipes r
INNER JOIN RecipeIngredients ri
    ON ri.recipe_id = r.id
INNER JOIN Ingredients i
    ON i.id = ri.ingredient_id
WHERE i.name IN ('chocolate', 'cream')
GROUP BY r.id, r.Name
HAVING COUNT(*) = 2

Or, if a recipe might have multiple instances of the same ingredient (no UNIQUE constraint on the RecipeIngredients association table), you can replace the last line with:

HAVING COUNT(DISTINCT i.name) = 2
Aaronaught
@OMG Ponies: This caveat was mentioned in the second paragraph. But I suppose it wouldn't hurt to add the alternative (it will run slower).
Aaronaught
Sorry, I didn't read closely
OMG Ponies
+1  A: 
select r.*
from Recipes r
inner join (
    select ri.recipe_id
    from RecipeIngredients ri 
    inner join Ingredients i on ri.ingredient_id = i.id
    where i.name in ('chocolate', 'cream')
    group by ri.recipe_id
    having count(distinct ri.ingredient_id) = 2
) rm on r.id = rm.recipe_id
RedFilter
+3  A: 

Use:

  SELECT r.name
    FROM RECIPES r
    JOIN RECIPEINGREDIENTS ri ON ri.recipe_id = r.id
    JOIN INGREDIENTS i ON i.id = ri.ingredient_id
                      AND i.name IN ('chocolate', 'cream')
GROUP BY r.name
  HAVING COUNT(DISTINCT i.name) = 2

The key point here is that the count must equal the number of ingredient names. If it's not a distinct count, there's a risk of false positives due to duplicates.

OMG Ponies
The first option looks like what I want. I tried it with no joy using PostgreSQL 8.3.5 and SQLite3. I tried with just one ingredient and for each of the tree ingredients, in turn, I get 2 rows back as I would expect. But, when I try with 2 ingredients, I get 0 rows.
Bryan Ash
That's what you get for not specifying the databases are you testing against. The S in SQL doesn't mean "Standardized"; beyond a basic SELECT there's almost no chance of being 100% portable to all vendors.
OMG Ponies
Which database would this work with? Maybe I can compare syntax and see how to apply it to PostgreSQL. Your style of solution is easier for me to understand than the NOT EXISTS
Bryan Ash
@Bryan - RE: Easier to understand. The double negative does take a bit of getting used to. To reformulate it in English it is saying `Select those recipes where there isn't an ingredient in the list (chocolate, cream) that isn't in the recipe.`
Martin Smith
@OMG That query doesn't look quite right as is to me. It seems to be saying that for each row ri.ingredient_id will have 2 different values.
Martin Smith
@OMG but you are doing two mutually exclusive inner joins from it's ingredient_id. See here for SQL Server results http://cloudexchange.cloudapp.net/stackoverflow/q/3092
Martin Smith
@Martin Smith: You're right, I confirmed on SQL Server 2008 Express - it's because of the inner join.
OMG Ponies
A: 
SELECT DISTINCT r.id, r.name
FROM Recipes r
INNER JOIN RecipeIngredients ri ON
    ri.recipe_id = r.id
INNER JOIN Ingredients i ON
    i.id = ri.ingredient_id
WHERE
    i.name IN ( 'cream', 'chocolate' )

Edited following comment, thanks! This is the right way then:

SELECT DISTINCT r.id, r.name
FROM Recipes r
INNER JOIN RecipeIngredients ri ON
    ri.recipe_id = r.id
INNER JOIN Ingredients i ON
    i.id = ri.ingredient_id AND
    i.name = 'cream'
INNER JOIN Ingredients i2 ON
    i2.id = ri.ingredient_id AND
    i2.name = 'chocolate'
Diego Pereyra
This will also return recipes where cream is used without chocolate or chocolate is used without cream.
Craig Trader
+2  A: 

This is called relational division. A variety of techniques are discussed here.

One alternative not yet given is the double NOT EXISTS

SELECT r.id, r.name
FROM Recipes r
WHERE NOT EXISTS (SELECT * FROM Ingredients i
                  WHERE name IN ('chocolate', 'cream')
                  AND NOT EXISTS
                      (SELECT * FROM RecipeIngredients ri
                       WHERE ri.recipe_id = r.id
                       AND ri.ingredient_id = i.id))
Martin Smith
Martin, thanks for the reference and an answer that works!There is a minor error in the last line, should be "AND ri.ingredient_id = i.id))"
Bryan Ash
@Bryan, Cheers. I've fixed it on my answer just for completeness.
Martin Smith
+1  A: 

a different way:

Version 2 (as stored procedure) revised

select   r.name
from   recipes r
where   r.id  = (select  t1.recipe_id
        from  RecipeIngredients t1 inner join
     RecipeIngredients     t2 on t1.recipe_id = t2.recipe_id
     and     t1.ingredient_id = @recipeId1
     and     t2.ingredient_id = @recipeId2)

Edit 2: [before people start screaming] :)

This can be placed at the top of version 2, which will allow to query by name instead of passing in the id.

select @recipeId1 = recipe_id from Ingredients where name = @Ingredient1
select @recipeId2 = recipe_id from Ingredients where name = @Ingredient2

I've tested version 2, and it works. Most users where linking on the Ingredient table, in this case was totally not needed!

Edit 3: (test results);

When this stored procedure is run these are the results.

The results are of the format (First Recipe_id ; Second Recipe_id, Result)

1,1, Failed
1,2, 'banana cream pie'
1,3, 'chocolate banana surprise'
2,1, 'banana cream pie'
2,2, Failed
2,3, 'chocolate cream pie'
3,1, 'chocolate banana surprise'
3,2, 'chocolate cream pie'
3,3, Failed

Clearly this query does not handle case when both constraints are the same, but works for all other cases.

Edit 4:(handling same constraint case):

replacing this line:

r.id = (select t1...

to

r.id in (select t1...

works with the failed cases to give:

1,1, 'banana cream pie' and 'chocolate banana surprise'
2,2, 'chocolate cream pie' and 'banana cream pie'
3,3, 'chocolate cream pie' and 'chocolate banana surprise'
Darknight
You seem to be assuming that only one recipe_id can match and there is nothing limiting it to chocolate and cream
Martin Smith
Try out version 2 (based on the same idea as first version). I've test it and it does work correctly.
Darknight
You're still using ` r.id =` assuming only one result. Ingredients is needed of course if you are matching up by ingredient name but I guess likely this will come from a list and the Ids will be known.
Martin Smith
Just edited it again, because I knew people would start screaming!! :)
Darknight
@Darknight Have you tested this with RecipeIngredients containing 2 different recipes both containing chocolate and cream?
Martin Smith
see the test results now.
Darknight