tags:

views:

81

answers:

2

Hey everyone, I am currently building a highly efficient website on culinary recipes (let's just put it that way).

Each recipe has 10 or less ingredients.

So given an ingredient, how can I find 1) All the recipes that the subject is present, and 2) All the other ingredients that are present in each of the recipes in 1) QUICKLY.

My initial planning is this: (and I will explain why: note that the syntax is not specific to a language so that it's easier for everyone to understand, Note that I am doing this first on MySQL, then if possible, I might want to move on to something faster with less functionality like tokyodb, which I am a big fan of).

Create TABLE mapping:
# This table holds all the names of everything, hopefully storing int
# will cause less IO
id int(32)
name char(32) 
index/key (name) 

Create table recipes:
# Note that ing1 - ing10 are not nulls, zero will be used for empty
name char(32)
ing1 int(32)
ing2 int(32)
ing3 int(32)
ing4 int(32)
ing5 int(32)
ing6 int(32)
ing7 int(32)
ing8 int(32)
ing9 int(32)
ing10 int(32)
index/key on name

create table relationship:
# This table holds the relationship between ingredients and recipe
ing int(32)
recipe int(32) 
key/index on ing

What do you guys think? Can someone think of a better implementation?

A: 

I think you will need a separate Ingredient table - your Mappings doesn't have room for details of ingredients, conceivably you might want to include supplier references or product codes one day.

Your relationship table already allows you to deduce the recipe-incredient set, hence you don't need the ing1-10 fields.

I would call relationship table an ingredientUsage table. It may need fields such as quantities.

You can now easily search for ingredientUsages where ing is your chosen id, and join to the recipe and join back to igredientUsage and ingredient for the other ingredients.

djna
OK, let say all the information is in fact "1) All the recipes that the subject is present, and 2) All the other ingredients that are present in each of the recips"If I implement your architecture, then every recipe lookup will requires my db to scan through the entire "relationship" table. Wouldn't that affect performance significantly? (even if there's indexing). You see, I treat the act of retreiving recipes much more crucial then for ingredients-recipe lookup.
It's not scanning if it's using an index. Realtionship will surely be indexed on recipeId and ingredientId. Intuition is not a reliable guide, one should verify any assumption, but my guess is that this should be fine. That does remind me, whatever design you come up with it should be compartively little work to build a dummy database of the relevent size and verify that your important queries do perform. I would recommend doing that before writing serious code.
djna
+1  A: 

The index of recipes to ingredients is very good. But it also makes your recipe table seem pointless.

A recipe is made up of 4 things:

  1. A list of ingredients.
  2. The quantity of each ingredient.
  3. The name of the recipe.
  4. The preparation and cooking instructions (what might be considered the actual recipe).

So if you have a table that indexes the recipe index number to the ingredient index number, you don't really need to list each ingredient index number within the recipe itself. You could, in fact, have your recipe/ingredient index have two more columns, one for the quantity of the ingredient (a simple floating number between .1 and 100, let's say), and another column for the units for that quantity (ounces, tablespoons, cups, etc.) which, based on your other tables, I imagine you would want to be a integer which ties to a table of possible measurements. Speaking of, you would also want to have a measurement called something like "self", for instances of the ingredient being "eggs" and quantity being "3" so that it would resolve to "3 eggs" without the need for ounces or cups.

Also, you mention that "0 will be used" but in what situation does a recipe call for 0 of something? Just curious.

So the result would be:

#table of recipes names
 id int(32)
 name char(32) 

#table of ingredients
 id int(32)
 ingredient char(32)     

#table of instructions
 id int(32)
 rec_id int(32) #foreign key for recipes
 instruct text

#table of units
 id int(32)
 unit (text)

#table of recipe-ingredients
 id int(32)
 rec_id int(32) #foreign key for recipes
 igred_id (32) #foreign key for ingredient
 quant float(5, 2) #amount per ingred
 unit_id int (32)

Now you can do a query for any or all ingredients checked off, it can return the index of the recipes that match, you output this as a list to the user. The user selects the recipe that gets their attention, which triggers a query for the recipe instructions and another query for the recipe ingredients (which already has the amounts in the same table).

Anthony
man I don't know why you would add so much additional info. Maybe I wasn't really clear. So given an ingredient, how can I find 1) All the recipes that the subject is present, and 2) All the other ingredients that are present in each of the recipes in 1) QUICKLY.
you want recipes with no quantities or instructions? take 'em out. The fundamental shape here is reasonable anyway. Don't say "quickly" say how quickly - in all performance work set an explicit goal and then see if you can reach it. eg. User in browser retrieves one recipe in sub 1 second. Consider whether it matters if your db query contribues 20 or 25 milliseconds, or 100 milliseconds.
djna
@phroxy. I didn't add any additional info. I accounted for info that your example didn't. If anything, I took out info by removing the table for recipe_ingredients. Given my layout, you can now do a simple query for all recipe_numbers with the ingredients given from only one table, and then do a join on the recipe name, if that's all you want. Sorry for trying to make your life easier.
Anthony
@phroxy - the 10-column table is a disaster for query optimization; the solution proposed by Anthony is superior in every respect.
Jonathan Leffler