views:

214

answers:

5

What is better database design?

Having one big table that could contain different "types" of records eg: employees, cars, cell-phones, and in order to identify each type of record, we have a column called type.

So the table would have columns that look like

id | type | name

1 | car | Ford

2 | car | Toyota

3 | phone | motorola

4 | employee | Jack

5 | employee | Aneesh

6 | phone | Nokia

7 | phone | Motorola

or have different tables for each type

eg:

Employees

id | name

Cars

id | name

Phones

id | name

These tables could have foreign key references from other tables. Now if each table had different columns the decision would have been simple that you can't have that in the same table. So option 1 is probably ruled out (unless all columns that are not common are nullable). But what if these different entities had similar columns, in that case what is better design?

What might be the arguments for and against each?

+5  A: 

Since they are really different types, I'd suggest storing thedm in separate tables. This prevents having to maintain a list of types, since they're all in their own tables. Furthermore, if, in the future one of these types is extended (e.g. you're going to store phone numbers for employees), you're not getting any weird relationships like phone numbers for cars. It also makes your database easier to understand and maintain.

Roald van Doorn
+1  A: 

Seperate tables

LnDCobra
+4  A: 

Roald van Doorn is absolutely right. If you have a single table and you extend it in any way, you'll be violating Second Normal Form. As William Kent said, "Second normal form is violated when a non-key field is a fact about a subset of a key." Roald van Doorn's example of "phone numbers for employees" illustrates the violation.

William Kent's A Simple Guide to Five Normal Forms in Relational Database Theory is a great paper to review when asking yourself a database design question.

simeonwillbanks
+1 for relevant link, which I forgot to include.
Roald van Doorn
+4  A: 

I agree with everyone - definitely use separate tables. You loose nothing by having separate tables - just because you have a few more tables, your database won't get any slower or less manageable.

But you gain a lot - you don't have to have lots of fields that make no sense for one type of entity, and so on. You adhere to 2NF as many have pointed out, and that's definitely a good thing!

Check out this interesting article on Simple Talk called

Five Simple Database Design Errors and how to avoid them

Error #1 is what the author calls the "common lookup table", which sounds a lot like what you're trying to do - but for real live data.

Read the article, internalize all its requirements - excellent stuff and highly recommended!

marc_s
+1 for relevant link, which I forgot to include.
Roald van Doorn
A: 

I'll say that all of the answers above are 1. correct given the example cited in the question and 2. correct for almost all off the time.

Every now an then I come across a situation where a single table is better. It's so infrequent, that when it comes up, I wonder if I need to architect around a monolithic (neutral) entity, or not. I quickly dismiss the urge -- perhaps asking someone else, and I fail to state my case adequately and we fall back on doing it the way we always do it.

Then, as it turns out, much too late in the game I figure out I should have made a single table of a neutral entity type.

Here's an example that makes my case:

Suppose two entity types, a corporation and a person. A corp is typically owned by a person, but sometimes another corporation owns a corporation.

Holding onto that thought, and adding to it, let's say that every corporation has a registered agent who is responsible for the legal creation of the corporation. AND, furthering my illustration, the registered agent can be either a person or another corporation.

Given that the owner/parent to the corporation/child can be either a person or amcorporation, you may begin to see the challenge. In contrast, if only people could own Corporations, your Ownership link table is very conventional with columns: OwnershipID (sort of unecessary), CorporationID, PersonID.

Instead, you need something like: OwnershipID, CorporationID, OwnerID, OwnerType And somehow or another you can make this work, but it won't be fun, to say the least.

Continuing on with the example I gave, you need to assign an agent to every Corporation. Usually, the agent is one of the owners (a person). In which case, you really do want to link back to the one record of that person. You don't want to have record of the person as an owner and then again as an agent (in an Agent table). That would be redundant. Bad things will happen. :-)

Similarly to that "problem" a registered agent can also be a corporation, such as a law firm, a CPA, or a Biz Filings company, calling out some typical examples. Just like the agent-person, an agent-corporation really should not be get its own record. It needs to link back to the already existing record of its corporate existence in the Corporation table. [except that I'm ultimately saying to not have a Corporation table]

Just like the link table that matched each corporation to its owner(s) of any type, person or corporation, you could have an agent link table of: AgentRepresentationID, CorporationID, AgentID, AgentType... but again, it would be ugly (IMO) when you have to pull together the related agents -- some from the Person table, some from the Corporation table.

So instead, in this case, you can see how a neutral entity type can be advantageous. It would be something like this:

Table: EntityAll Key Columns: EntityID, EntityType (or EntityTypeID if you insist, link out to get the description), EntityName (there are concerns with names and different types... off topic to this post)

Link Table: CorporationOwnership Key Columns: OwnershipID (again, my comment that this is kind of unecessary), ChildEntityID (the entity being owned; named "Child" for clarity, I wouldn't name it that) ParentEntityID (the parent entity)

Link Table: AgentRepresentation Key Columns: AgentRepresentationID (...I won't say it), CorporationEntityID (the corp entity being represented), AgentEntityID (from the Entity table, equating to the record that's the agent here)

While you might be OK with my architecture, you should be a little bothered by the column naming in the link tables. It bothers me. Typically the second and third column names in those tables match exactly the name of the columns you JOIN in each entity's respective table (haha, but each entity doesn't have a respective table so you can't have the link table column names match the source column names because they are THE same column). Technically this does not matter, but it will break your naming conventions which should matter, but not enough to not do it.

In case I haven't driven it home well enough yet, here is how you'll pull it together. You JOIN the EntityAll table on itself to get what you need.

List all Corps and their owners (in T-SQL):

SELECT Corp.EntityName as CorpName, Owner.EntityName as OwnerName
FROM EntityAll as Corp
JOIN CorporationOwnership as Link on (Corp.EntityID = Link.ChildEntityID)
JOIN EntityAll as Owner on (Link.ParentEntityID = Owner.EntityID)

Consequently, you'd do the same thing to get the agent, instead of the owner(s).

I realize this is not how we're trained to architect things, but I feel pretty strongly that my solution eliminates redundant data AND makes it easier to code, manage and read.

If you insist I'm wrong, though, let me know. Suggest how you'd architect my example with separate Corporation and Person entity tables. Cheers!

Chris Adragna
I should admit, the example I chose is one that I implemented already by using tables for each entity type (even an agent table). My needs go beyond the basic JOIN logic. Very soon I need to be able to rollup entities across entity types (like the firm John Smith CPA, P.A. and its owner, John Smith can rollup to a logical unit for various purposes). Rollups might even bring in records outside of their already in place linkage defined by the link table(s).
Chris Adragna