views:

369

answers:

4

Consider a database with tables Products and Employees. There is a new requirement to model current product managers, being the sole employee responsible for a product, noting that some products are simple or mature enough to require no product manager. That is, each product can have zero or one product manager.

Approach 1: alter table Product to add a new NULLable column product_manager_employee_ID so that a product with no product manager is modelled by the NULL value.

Approach 2: create a new table ProductManagers with non-NULLable columns product_ID and employee_ID, with a unique constraint on product_ID, so that a product with no product manager is modelled by the absence of a row in this table.

There are other approaches but these are the two I seem to encounter most often.

Assuming these are both legitimate design choices (as I'm inclined to believe) and merely represent differing styles, do they have names? I prefer approach 2 and find it hard to convey the difference in style to someone who prefers approach 1 without employing an actual example (as I have done here!) I'd would be nice if I could say, "I'm prefer the inclination-towards-6NF (or whatever) style myself."

Assuming one of these approaches is in fact an anti-pattern (as I merely suspect may be the case for approach 1 by modelling a relationship between two entities as an attribute of one of those entities) does this anti-pattern have a name?

+6  A: 

Well the first is nothing more than a one-to-many relationship (one employee to many products). This is sometimes referred to as a O:M relationship (zero to many) because it's optional (not every product has a product manager). Also not every employee is a product manager so its optional on the other side too.

The second is a join table, usually used for a many-to-many relationship. But since one side is only one-to-one (each product is only in the table once) it's really just a convoluted one-to-many relationship.

Personally I prefer the first one but neither is wrong (or bad).

The second would be used for two reasons that come to mind.

  1. You envision the possibility that a product will have more than one manager; or
  2. You want to track the history of who the product manager is for a product. You do this with, say a current_flag column set to 'Y' (or similar) where only one at a time can be current. This is actually a pretty common pattern in database-centric applications.
cletus
+2  A: 

It looks to me like the two model different behaviour. In the first example, you can have one product manager per product and one employee can be product manager for more than one product (one to many). The second appears to allow for more than one product manager per product (many to many). This would suggest the two solutions are equally valid in different situations and which one you use would depend on the business rule.

1800 INFORMATION
Fair point but I didn't intend it to look that way. I've now tired to edit to clarify my intent that each product will have zero or one product manager (for approach 2 this requires a unique constraint on product manager ID only, I think).
onedaywhen
A: 

There is a flaw in the first approach. Imagine for a second, that the business requirements have changed and now you need to be able to set 2 Product Manager to a product. What will you do? Add another column to the table Product? Yuck. This obviously violates 1NF then.

Another option the second approach gives is an ability to store some attributes for a certain Product Manager <-> Product relation. Like, if you have two Product Manager for a product, then you can set one of them as a primary... Or, for example, an employee can have a phone number, but as a product manager he/she can have another phone number... This also goes to the special table then.

Yacoder
I was implicitly applying the YAGNI principle (http://en.wikipedia.org/wiki/You_Ain%27t_Gonna_Need_It). While I agree it does no harm to think about future maintenance, I think both approaches satisfy the current requirements and therefore are acceptable (and would stop short of saying one is 'flawed').
onedaywhen
Of course, they are "acceptable", in terms of solving this particular problem. But the first one really looks more like a quick hack, than a "design-pattern". YAGNI principle is not a "golden rule" as well, I'd say "Requirements Change" rule is more of a factor to consider.
Yacoder
I'd argue that introducing a relation table for a 1:M relationship obfuscates the actual purpose of the system. In the general case, a foreign key (*NOT* a relation table) is the accepted, best-practice solution for 1:M relationships. Perhaps it seems like a quick hack in this instance because it is exceedingly reasonable that some products might have more than one manager.
Hank Gay
A: 

Approach 1)

  1. Slows down the use of the Product table with the additional Product Manager field (maybe not for all databases but for some).
  2. Linking from the Product table to the Employee table is simple.

Approach 2)

  1. Existing queries using the Product table are not affected.
  2. Increases the size of your database. You've now duplicated the Product ID column to another table as well as added unique constraints and indexes to that table.
  3. Linking from the Product table to the Employee table is more cumbersome and costly as you have to ink to the intermediate table first.

How often must you link between the two tables?

How many other queries use the Product table?

How many records in the Product table?

Paul Morgan
"Increases the size of your database. You've now duplicated the Product ID column to another table as well as added unique constraints and indexes to that table" -- that's another of yours that should be qualified with "maybe not for all databases but for some". This is meant to be a design question, rather than implementation: I of course plan to denormalize the whole thing as one huge spreadsheet :) But thanks for your reflections on both approaches.
onedaywhen