ansaurus

Question

Options for eliminating NULLable columns from a DB model (in order to avoid SQL's three-valued logic)?

Answer 1

+1 A:

I haven't read it, but there's an article called How To Handle Missing Information Using S-by-C on the the Third Manifesto website that's run by Hugh Darwen and C.J. Date. This isn't written by C.J. Date, but I'd assume that since it's one of the articles on that website it's probably similar to his opinions.

ho1 2010-06-20 16:12:58

There's some interesting information on that website. However, the specific paper that you mention seems more academic in nature: It gives examples in the Tutorial D language and not in SQL, which is what's actually used today. (+1 for the link to the website.)

stakx 2010-06-20 16:49:50

@stakx: Ah, doesn't surprise me I suppose, I found the website a while ago after reading another question on SO but I found it all a bit too academical to actually read through anything.

ho1 2010-06-20 16:51:22

Answer 2

+1 A:

One alternative may be the entity-attribute-value model:

 entity  attribute    value
 1       name         Banana Man
 1       birthdate    1968-06-20

If the birthdate was unknown, you'd just omit its row.

Andomar 2010-06-20 16:20:52

I guess the problem with this model is that you have to choose a single type for column `value` that is appropriate for all sorts of things (ie. this would require parsing strings or even `BLOB`s on the client side).

stakx 2010-06-20 16:24:23

@stakx: You could add a `type` column along with `str_value`, `int_value`, `real_value` columns. And I think SQLIte supports EAV natively and store integers, floats and strings in the same column efficiently

Andomar 2010-06-20 16:40:28

@Andomar: I didn't know that SQLite has native support for the EAV model. It's an interesting feature, despite being a non-standard (?) one that other RDBMS might not have. Thanks for the info!

stakx 2010-06-20 16:47:14

EAV is at least controversial - check out http://www.simple-talk.com/sql/t-sql-programming/avoiding-the-eav-of-destruction/ or http://www.simple-talk.com/sql/database-administration/five-simple--database-design-errors-you-should-avoid/ for some good reason why you should at least be careful when choosing the EAV approach

marc_s 2010-06-20 17:13:47

Answer 3

+2 A:

I saw Date's colleague Hugh Darwen discuss this issue in an excellent presentation "How To Handle Missing Information Without Using NULL", which is available on the Third Manifesto website.

His solution is a variant on your second approach. It's sixth normal form, with tables to hold both Date of Birth and identifiers where it is unknown:

#  +-----------------------------+ 1    0..1 +----------------------------+
#  |         People'             | <-------> |         DatesOfBirth       |
#  +------------+----------------+           +------------+---------------+
#  |  PersonID  |  Name          |           |  PersonID  |  DateOfBirth  |
#  +============+----------------+           +============+---------------+
#  |  1         |  Banana Man    |           ! 2          | 20-MAY-1991   |
#  |  2         |  Satsuma Girl  |           +------------+---------------+
#  +------------+----------------+
#                                  1    0..1 +------------+
#                                  <-------> | DobUnknown |
#                                            +------------+
#                                            |  PersonID  |
#                                            +============+
#                                            | 1          |
#                                            +------------+

Selecting from People then requires joining all three tables, including boilerplate to indicate the unknown Dates Of Birth.

Of course, this is somewhat theoretical. The state of SQL these days is still not sufficiently advanced to handle all this. Hugh's presentation covers these shortcomings. One thing he mentions is not entirely correct: some flavours of SQL do support multiple assignment - for instance Oracle's INSERT ALL syntax.

APC 2010-06-20 19:47:43

I've read the paper on Third Manifesto and quite like the solution, mainly because it resolves the ambiguity of what exactly `NULL` means. While this separation of different meanings of `NULL` into separate tables improves data quality, I agree with you that actually querying the data will become more difficult.

stakx 2010-06-20 19:59:27

Answer 4

+1 A:

Option 3: Onus on the record writer:

CREATE TABLE Person
(
  PersonId int PRIMARY KEY IDENTITY(1,1),
  Name nvarchar(100) NOT NULL,
  DateOfBirth datetime NOT NULL
)

Why contort a model to allow null representation when your goal is to eliminate them?

David B 2010-06-22 19:57:45

*Every* solution to my question would eventually have `NOT NULL` on all columns (where applicable). But the real question -- if you don't reduce my post to just the last sentence -- was *how* a relational data model would have to be modified **so that it can still accomodate missing or unknown information** for some attributes. Your solution however completely *disallows* missing information for the `DateOfBirth` column and thus misses the most important bit.

stakx 2010-06-22 20:50:27

If you still permit the third value, you must still write logic about the third value. I don't know why your implementation of the third value would be better than any db vendor's implementation.

David B 2010-06-22 23:39:29

@David B: I want to get *rid* of the third value (`UNKNOWN`), *but* at the same time still be able to represent the fact that some information is missing (or unknown, or whatever). This is obviously possible. I am investigating different ways of doing this. You're perhaps saying, "Why not just use `NULL`, it's there for just that purpose?" -- Because the 3VL *can* lead to confusing and unintuitive queries and query results. Even without 3VL, one of course still has to deal *somehow* with possibly missing information -- but doing this with 2VL might prevent some logic errors.

stakx 2010-06-23 18:28:49

It's the concept of null that makes the queries confusing and unintuitive, not the mechanism of null. As long as the concept is allowed, you will have to struggle against it.

David B 2010-06-23 19:28:42

ansaurus

tags:

views:

answers:

Options for eliminating NULLable columns from a DB model (in order to avoid SQL's three-valued logic)?

related questions