Some while ago, I've been reading through the book SQL and Relational Theory by C. J. Date. The author is well-known for criticising SQL's three-valued logic (3VL).1)
The author makes some strong points about why 3VL should be avoided in SQL, however he doesn't outline how a database model would look like if nullable columns weren't allowed. I've thought on this for a bit and have come up with the following solutions. If I missed other design options, I would like to hear about them!
1) Date's critique of SQL's 3VL has in turn been criticized too: see this paper by Claude Rubinson (includes the original critique by C. J. Date).
Example table:
As an example, take the following table where we have one nullable column (DateOfBirth
):
# +-------------------------------------------+
# | People |
# +------------+--------------+---------------+
# | PersonID | Name | DateOfBirth |
# +============+--------------+---------------+
# | 1 | Banana Man | NULL |
# +------------+--------------+---------------+
Option 1: Emulating NULL
through a flag and a default value:
Instead of making the column nullable, any default value is specified (e.g. 1900-01-01
). An additional BOOLEAN
column will specify whether the value in DateOfBirth
should simply be ignored or whether it actually contains data.
# +------------------------------------------------------------------+
# | People' |
# +------------+--------------+----------------------+---------------+
# | PersonID | Name | IsDateOfBirthKnown | DateOfBirth |
# +============+--------------+----------------------+---------------+
# | 1 | Banana Man | FALSE | 1900-01-01 |
# +------------+--------------+----------------------+---------------+
Option 2: Turning a nullable column into a separate table:
The nullable column is replaced by a new table (DatesOfBirth
). If a record doesn't have data for that column, there won't be a record in the new table:
# +---------------------------+ 1 0..1 +----------------------------+
# | People' | <-------> | DatesOfBirth |
# +------------+--------------+ +------------+---------------+
# | PersonID | Name | | PersonID | DateOfBirth |
# +============+--------------+ +============+---------------+
# | 1 | Banana Man |
# +------------+--------------+
While this seems like the better solution, this would possibly result in many tables that need to be joined for a single query. Since OUTER JOIN
s won't be allowed (because they would introduce NULL
into the result set), all the necessary data could possibly no longer be fetched with just a single query as before.
Question:
Are there any other options for eliminating NULL
(and if so, what are they)?