views:

46

answers:

2

No worries! It looks more complex than it actually is! Just get down to the drinks!

TLDR-version: How to efficiently query and update entities having relationships to other entities?

Here's an interesting data modeling scenario with two tables that has been puzzling me:

Entities { ID, Name, ScalarValue }

ComponentEntities { AggregateEntityID, ComponentEntityID, quantity }

AggregateEntityID and ComponentEntityID are foreign keys to the Entities table.

Give me the bloody example already

Drinks { ID, Name, Alcohol% }

DrinkIngredients { CocktailID, IngredientID, amount }

Drinks { 1, "Vodka", 40% }
Drinks { 2, "Tomato juice", 0% }
Drinks { 3, "Tabasco", 0% }
Drinks { 4, "Bloody mary", - }

DrinkIngredients { 4, 1, 0.2 } // Bloody mary has 0.2*Vodka
DrinkIngredients { 4, 2, 0.7 } // Bloody mary has 0.7*Tomato juice
DrinkIngredients { 4, 3, 0.1 } // Bloody mary has 0.1*Tabasco

If we wanted to get Bloody Mary's alcohol contents, we would SELECT * FROM DrinkIngredients WHERE CocktailID == 4.

Pretty standard; nothing weird there. Lisa likes to make it a bit sweeter by adding some Passion to it:

Drinks { 6, "Passion", 13% }
Drinks { 7, "Bloody Mary Pink", - }

DrinkIngredients { 7, 4, 0.8 }  // Bloody Mary Pink has 0.8*Bloody Mary
DrinkIngredients { 7, 6, 0.2 }  // Bloody Mary Pink has 0.2*Passion

Lisa's mum has been tasting these for so long that she believes she has found the ultimate blend between the two:

Drinks { 8, "Bloody Milf", - }
DrinkIngredients { 8, 4, 0.45 } // Bloody Milf has 0.45*Bloody Mary
DrinkIngredients { 8, 7, 0.55 } // Bloody Milf has 0.55*Bloody Mary Pink

Add couple more of these consists of levels and we have a deep relational recursion. The only restriction is that entity cannot consist of itself.

This seem to form a directed acyclic graph.

RDBMS: One way to "cache" the data would be to calculate relevant data and store it in the Entity itself (or perhaps in another table). In the example above, the alcohol content for Bloody Mary would calculated once when it's created and stored in its Alcohol% field. In this case, updates become expensive because we have to update every drink (along with the whole dependency hierarchy) consisting of the updated one.

Questions

RDBMS: Is there a better way to get to the leaf values (drinks that don't consist of other ones) than getting the "parent" drink until a leaf drink is reached?

Both, RDBMS and NoSQL, have a problem with this: one way or the other.

Bottom-line: is this even practical and feasible?

What I need is a counter-inception

alt text

A: 

Many RDMSs support recursive queries. See e. g. http://msdn.microsoft.com/en-us/library/ms186243.aspx.

Frank
+2  A: 

"RDBMS: Is there a better way to get to the leaf values (drinks that don't consist of other ones) than getting the "parent" drink until a leaf drink is reached?"

Don't understand this. Drinks that don't consist of other ones has nothing to do with recursion. It's a simple EXCEPT or WHERE NOT EXISTS.

And "getting to the leaf values" (given a parent) will inevitably require traversing the tree, regardless of the data structure (relational or hierarchical) used to model it, wouldn't you think ?

Both, RDBMS and NoSQL, have a problem with this: one way or the other.

RDBMS don't really have a problem with this. The problem was already identified a few decades ago (80's or so), and was addressed by amending the relational algebra with a transitive closure operation, and a generalized version of it. SQL supports this through recursive queries, and as Frank said, at least all the big dogs all support recursive queries one way or the other.

Bottom-line: is this even practical and feasible?"

Writing recursive queries isn't exactly trivial if you've never done it before. Does that make it "unpractical" ? I wouldn't know.

Erwin Smout
Transitive closures was the missing link! The solution is to simply flatten the recursive relations by storing also the intermediate via-paths. It's a space-time compromise. Thank you for the inception. :D
randomguy