views:

40

answers:

2

Let's say we have a object called a Widget, for which we can construct a database table.

Now, let's say we have two sets of additional detail to describe widgets. Each set of data is available at a separate time. So, let's say our widgets have three phases to their life-cycle...

In phase 1, we simply have a widget with a name and a description.

widgets
-------
id (PK)
name
description

In phase 2, our widget gains a height and weight.

widgets
-------
id (PK)
name
description
height
weight

In phase 3, our widget gains a destination and shipping cost.

widgets
-------
id (PK)
name
description
height
weight
destination
shipping_cost

The above schema (for "phase 3") means that the database record for a widget in phase 1 or 2 will have null values.

Alternatively, we could construct a schema that will never have null values (but instead the parent record may have zero, one or two child records depending on the current phase of the widget's lifecycle):

widgets
-------
id (PK)
name
description

widget_specs
-------
id (PK)
widget_id (FK)
height
weight

widget_delivery
-------
id (PK)
widget_id (FK)
destination
shipping_cost

Is one of these alternatives always correct? Are there justifiable pros and cons to each? If the answer depends on more variables, what are they? Under what conditions would one alternative become the obvious preferred choice?

The accepted answer will cite a modern, authoritative source on the topic.

Edit: I feel this could easily be argumentative, but it's also a topic that should have justifiable pros & cons, and therefore an authoritative answer. The question is simply one that has bugged me, because I've seen it done both ways without justification or consideration for the alternative. I'd simply like to know which is correct, according to the current trend-setting DBA types.

+1  A: 

Your option to alleviate null columns is creating one-to-one relationships, or could a widget have more than one weight & delivery specs?

It also means you'll have to LEFT JOIN to both supporting tables to check info, where a single table doesn't need anything special (other than an IS/IS NOT NULL check in certain situations).

One-to-one relationships are a performance optimization, but that's not why you're asking this question...

OMG Ponies
To answer your question, the relationships from the parent widget table to the child tables are one-to-one.
Dolph
+1  A: 

Normal Form (BCNF / 5NF) is generally the most sound basis for database design unless you find compelling reasons to deviate from it. That means the schema without nulls should be preferred. Normalization reduces redundant data and the potential for anomalies to arise and minimises inbuilt "bias" in the design, making it easier to maintain and extend.

Nulls complicate most operations on the database and cause incorrect results with some queries. Only add nulls to a design where you find some special reason for doing so - typically those reasons are to do with DBMS limitations that won't allow you easily to implement some constraint or other logic without using nulls. Also bear in mind that whenever a database designer adds nulls into the database, an application designer typically has to do extra work to remove or hide them for the benefit of the end user.

You can find extensive discussions on nulls and other issues to do with missing data in Fabian Pascal's book "Practical Issues in Database Management" and in the books of Chris Date and papers by E.F.Codd, Witold Lipski and many others.

dportas