views:

19

answers:

1

I'm trying to find the best way to determine how similar a group of items (in this example; ingredients in a guacamole recipe) is to all groups of items (recipes in a table; linked to another table of ingredients).

For instance; I have the following guacamole recipe:

3 Avocados
1 Vine-Ripened Tomatoes
1 Red Onion
3 Jalapenos
1 Sea Salt
1 Pepper

I want to run this recipe through the table of all my recipes to determine if there is another recipe that is similar to it (based on ingredients and count), order by how similar it is. Additionally, I would like it to identify the differences (whether it's just the difference in count of ingredient; or a different in ingredient).

A possible output would be:

3 Avocados
(- 1 Vine-Ripened Tomatoes)
1 Red Onion
3 Jalapenos
1 Sea Salt
(- 1 Pepper)
(+ Tobasco)
89.5% Identical

This could also be used to determine the following use case: "Given a list of ingredients in my refrigerator; what can I make to eat?".

Thanks for any assistance in pointing me in the right direction.

+1  A: 

Off the top of my head, here some issues I can see that will come up through string matching:

  • 3 Avocados and 2 Avocados both use avocado, but the strings are not a match.
  • 1 tbsp salt and 15ml salt refer to the same quantity of salt but the strings are not a match.

You might want to keep a table of recipe ingredients that also stores normalized quantities (ie. everything would be converted to a specific unit before being put into the db). I'm making the assumption here that you'll already have a table for recipes and a table for ingredients, both of which are used as foreign keys here (making this a join table)

CREATE TABLE recipe_ingredients (
  recipe_id INT NOT NULL,
  ingredient_id INT NOT NULL,
  quantity DECIMAL NOT NULL,
  PRIMARY KEY (recipe_id, ingredient_id),
  FOREIGN KEY recipe_id REFERENCES recipes (id),
  FOREIGN KEY ingredient_id REFERENCES ingredient (id)
)

And then when determining matches, you can use determine which recipe contains the most ingredients that you're looking for (this ignores quantities):

SELECT ri.recipe_id, COUNT(ri.ingredient_id) AS num_common_ingredients
FROM ingredients AS i
RIGHT JOIN recipe_ingredients AS ri
  ON ri.ingredient_id = i.id
WHERE i.id IN (?) -- list of ingredient IDs being searched for
GROUP BY ri.recipe_id
ORDER BY COUNT(ri.ingredient_id) DESC

The rows with the highest COUNT have the most similarity (because it means there are the greatest amount of common ingredients).

To determine similarity between quantities, once you have your recipes which match most number of ingredients, you can compare the quantity given to the quantity specified in recipe_ingredients.

Daniel Vandersluis
Ideally I would've liked for it have it returned a score like a fulltext search; which would probably still work if I put all the ingredients into a blob, but it wouldn't do the diff accurately.
Typhon