views:

1008

answers:

11

I was experiencing a hard time creating FK relationships in my relational SQL database and after a brief discussion at work, we realized that we have nullable columns which were most likely contributing to the problem. I have always viewed NULL as meaning unassigned, not specified, blank, etc. and have really never seen a problem with that.

The other developers I was speaking with felt that the only way to handle a situation where if a relationship did exist between 2 entities, then you would have to create a table that joins the data from both entities...

It seems intuitive to me at least to say that for a column that contains an ID from another table, if that column is not null, then it must have an ID from the other table, but if it is NULL then that is OK and move on. It seems like this in itself is contradictory to what some say and suggest.

What is the best practice or correct way to handle situations where there could be a relationship between two tables and if a value is specified then it must be in the other table...

A: 

The join table is the correct method.

Nulls in keys indicate bad database design.

A null value is not unassigned/empty/blank/etc, it is missing/unknown data.

Using nulls in a foreign key field does not mean "there's no relation", it means "I don't know if there's a relation or not" - which is clearly bad.

Peter Boughton
Why? I'd be interested why this is a bad design.
Ray
Do you store sentinel rows in each table to signify "not set" foreign keys?
Joe Holloway
[citation neeeded]
Blorgbeard
Because a null is not unassigned/empty, it is missing/unknown data.Using nulls is not saying "there's no relation", it's saying "I don't know if there's a relation or not".
Peter Boughton
Well, I think that's subjective - null can be used to mean "no value". And I don't agree that "I don't know if there's a relationship" is "clearly bad" - what if you *don't* know? Is it bad to use null to mean "unknown" in a non-FK column? If not, why is it bad for an FK?
Blorgbeard
It is bad to have unknown data in a database - because knowing data is the *point* of a database. I personally make an exception for date values, but that's it.
Peter Boughton
You are objectively wrong, Peter. There is a reason why foreign key constraints allow NULL values. It is perfectly valid to have optional relationships. For instance, a node in a tree can have a parent, or it can be the root (in which case the parent will be NULL).
cdonner
It is perfectly possible to implement a rooted tree without nulls - by using a two-column NodeParent. This also allows you to store information about the relationship (if necessary), which can be messy storing it alongside similar data in the original Node table.
Peter Boughton
Of course is it possible, but it is not necessary if you do not need to describe the relationship.
cdonner
From a practical perspective, I really have to agree with cdonner despite Peter's 'purist' argument. It is perfectly reasonable to say that null means that no value has been assigned in this case. Obviously SQL's architects thought it valid or they wouldn't have permitted nulls in FK fields.
Mark Brittingham
The general topic (which you can google for in depth) is "three-valued logic". Which you can choose to deny or reject, but it definitely puts you outside standard practices.
le dorfier
+1  A: 

I don't see a problem with null values if the field can be empty. An abuse is allowing null values when there should be information in that field.

epochwolf
+7  A: 

It's perfectly acceptable, and it means that, if that column has any value, its value must exist in another table. (I see other answers asserting otherwise, but I beg to differ.)

Think a table of Vehicles and Engines, and the Engines aren't installed in a Vehicle yet (so VehicleID is null). Or an Employee table with a Supervisor column and the CEO of the company.

Update: Per Solberg's request, here is an example of two tables that have a foreign key relationship showing that the foreign key field value can be null.

CREATE TABLE [dbo].[EngineTable](
    [EngineID] [int] IDENTITY(1,1) NOT NULL,
    [EngineCylinders] smallint NOT NULL,
 CONSTRAINT [EngineTbl_PK] PRIMARY KEY NONCLUSTERED 
(
    [EngineID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[CarTable](
    [CarID] [int] IDENTITY(1,1) NOT NULL,
    [Model] [varchar](32) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    [EngineID] [int] NULL
 CONSTRAINT [PK_UnitList] PRIMARY KEY CLUSTERED 
(
    [CarID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[CarTable]  WITH CHECK ADD CONSTRAINT [FK_Engine_Car] FOREIGN KEY([EngineID])
REFERENCES [dbo].[EngineTable] ([EngineID])


Insert Into EngineTable (EngineCylinders) Values (4);
Insert Into EngineTable (EngineCylinders) Values (6);
Insert Into EngineTable (EngineCylinders) Values (6);
Insert Into EngineTable (EngineCylinders) Values (8);

-- Now some tests:

Insert Into CarTable (Model, EngineID) Values ('G35x', 3);  -- References the third engine

Insert Into CarTable (Model, EngineID) Values ('Sienna', 13);  -- Invalid FK reference - throws an error

Insert Into CarTable (Model) Values ('M');  -- Leaves null in the engine id field & does NOT throw an error
le dorfier
After testing this out, it turns out that you can place nulls in a field that references a foreign key. I don't think I've ever done this in practice but it is legal SQL. Thus, I've deleted my answer and upvoted yours.
Mark Brittingham
@Mark - can you provide the syntax to create a FK with nulls?
RSolberg
Solberg - give me a minute. If le dorfier doesn't mind, I'll edit his example to add the tables (le dorfier?)
Mark Brittingham
Ok - example in place
Mark Brittingham
@Mark - Thanks :)
RSolberg
Thanks Mark - I've been out walking the dog. (I'm able to walk by myself, but if I take the leash I must take one of my dogs.) :D
le dorfier
To view it in simplest terms, you have two constraints you're working with. One is the Vehicle table which has an EngineID column, which may or may not be nullable. Second, you can have an FK constraint on the EngineID column, and declare that its EngineID must exist there. Two separate constraints.
le dorfier
Since I was just putting in an illustration of your point with no extra comments that might change your point, I didn't think you'd mind the addition. It also helps, of course, to have run across your answers and to have corresponded in the past ;-)
Mark Brittingham
I think the analogy is flawed. You wouldn't normally sell a car without an engine. And if your domain allowed for selling cars without engines, engineId wouldn't be a part of that table because it is not a necessary characteristic of the car.
jlembke
what @jlembke said + if you are selling a car without an engine you should communicate this to your users by letting them no there is no engine, this is a valid business rule (fk would be to an id that would specify 'no engine' or 'none'
Jon Erickson
Then you're making up artificial database artifacts just to satisfy your artificial constraints. The reasons for being no engine are various and legitimate. How do you ever handle manufacturing cases where the Vehicle and Engine don't jump into existence simultaneously (which never happens?)
le dorfier
One good design would be to have a Boolean column IsEngineInstalled, to explicitly express this progression. Another would be to have an InstallEngine event table, wherein there would be no record yet. But then the EngineID is null until this step occurs - it doesn't need some bogus non-NULL value.
le dorfier
+6  A: 

I think this debate is another byproduct of the object-relational impedence mismatch. Some DBA-types will pedantically say never allow null in a FK based on some deeper understanding of relational algebra semantics, but application developers will argue that it makes their domain layer more elegant.

The use cases for a "not yet established" relationship are valid, but with null FKs some find that it adds complexity to their queries by introducing more sophisticated features of SQL, specifically LEFT JOINs.

One common alternative solution I've seen is to introduce a "null row" or "sentinel row" into each table with pk=0 or pk=1 (based on what's supported by your RDBMS). This allows you to design a domain layer with "not yet established" relationships, but also avoid introducing LEFT JOINs as you're guaranteeing there will always be something to join against.

Of course, this approach requires diligence too because you're basically trading off LEFT JOINs for having to check the presence of your sentinel row in queries so you don't update/delete it, etc. Whether or not the trade offs are justified is another thing. I tend to agree that reinventing null just to avoid a fancier join seems a bit silly, but I also worked in an environment where application developers don't win debates against DBAs.

Edits

I removed some of the "matter of fact" wording and tried to clarify what I meant by "failing" joins. @wcoenen's example is the reason that I've personally heard most often for avoiding null FKs. It's not that they fail as in "broken", but rather fail--some would argue--to adhere to the principle of least surprise.

Also, I turned this response into a wiki since I've essentially butchered it from its original state and borrowed from other posts.

Joe Holloway
You meant "select * from engine where vehicle_id is not null;", right?
Blorgbeard
Nice answer btw, it's nice to see some reasoning for this POV :)
Blorgbeard
Yes, fixed and thanks
Joe Holloway
Why do you say that joins will fail, if you can't name an example? Joins involving NULL FKs behave exactly as they should - nothing will be joined.
cdonner
Agree with cdonner - I don't get it.
le dorfier
I understand the point you're making, but I think that it is appropriate that the join fail. It's not a problem of having a null, more a problem of logically (mis)understanding what it means to project rows from a join of multiple tables.
David Aldridge
Why would someone vote this as "offensive"?
Mark Brittingham
+1  A: 

You got it right. For an FK a NULL means no value (meaning no relationship). If there is a value in an FK it has to match exactly one value in the PK that it references.

It is not necessarily bad design to permit this. If a relationship is one-to-many and optional, it's perfectly OK to add a FK to the table on the one side, referencing the PK on the many side.

If a relationship is many-to-many it requires a table of its own, called a junction table. This table has two FKs, each referencing a PK in one of the tables being related. In this case an omitted relationship can be expressed by simply omitting an entire row from the junction table.

Some people design so as to avoid the necessity of permitting NULLS. These people will use a junction table for a many-to-one relationship, and omit a row, as above, when a relationship is omitted.

I don't follow this practice myself, but it does have certain benefits.

Walter Mitty
+2  A: 

Suppose you would need to generate a report of all customers. Each customer has a FK to a country and the country data needs to be included in the report. Now suppose you allow the FK to be null, and you do the following query:

SELECT * FROM customer, country WHERE customer.countryID = country.ID

Any customer where the country FK is null would be silently omitted from the report (you need to use LEFT JOIN instead to fix it). I find this unintuitive and surprising, so I don't like NULL FKs and avoid them in my database schemas. Instead I use sentinel values, e.g. a special "unkown country".

Wim Coenen
Personally, I very *very* rarely use any join type except for left join. I consider them the default.
Blorgbeard
+1  A: 
CREATE TABLE [tree]
{
    [id] int NOT NULL,
    [parent_id] int NULL
};

ALTER TABLE [tree] ADD CONSTRAINT [FK_tree_tree] FOREIGN KEY([parent_id])
REFERENCES [tree] ([id]);

There is nothing wrong with this! The root node will eternally have a NULL parent, and this is not a case of a "not yet established" relationship. No problem with joins here, either.

Having the root node point to itself as the parent to avoid the NULL FK, or any other creative workaround, means that the real world is no longer accurately modeled in the database.

The one potential issue that nobody mentioned is with index performance on columns that contain lots of NULL values. This per se has nothing to do with the foreign key question, though, but it can make joins perform poorly.

I do understand that if you are a DBA working with ultra-large databases that have hundreds of millions of rows, you would not want NULL foreign keys, because they would simply not perform. The truth is, though, that most developers will never work with such large databases in their lifetime, and today's databases can handle such a situation just fine with a few hundred thousand rows. To stress a (poor) metaphor, most of us so not drive F1 race cars, and the automatic transmission in my wife's Accord does what it needs to do just fine (or at least, it used to, until it broke a few weeks ago ...).

cdonner
+4  A: 

The problem with allowing nulls in foreign key columns arises when the foreign key is composite. What does it mean if one of the two columns is null? Does the other column have to match anything in the referenced table? With simple (single-column) foreign key constraints, you can get away with nulls.

On the other hand, if the relationship between the two tables is conditional (both entities can exist in their own right, but may almost coincidentally be related) then it may be best to model that with a 'joining table' - table that contains a FK to the referenced table and another to the referencing table and that has its own primary key as the combination of two FKs.

As an example of a joining table, suppose your database has tables of clubs and people. Some of the people belong to some of the clubs. The joining table would be club_members and would contain an FK for the person referencing the 'people' table, and would contain another FK for the club that the person belongs to, and the combination of identifiers for person and club would be the primary key of the joining table. (Another name for joining table is 'association' or 'associative' table.)

Jonathan Leffler
+1 Good observation
Mark Brittingham
@Jonathan - I would think that if you have a composite foreign key, then this argument is essentially not applicable as it simply wouldn't work... +1 though, very helpful.
RSolberg
+5  A: 

I'm strongly supportive of the arguments for NULLs in foreign keys to indicate no-parent in an OLTP system, but in a decision support system it rarely works well. There the most appropriate practice is to use a special "Not Applicable" (or similar) value as the parent (in the dimenson table) to which the child records (in the fact table) can link.

The reason for this is that the exploratory nature of drill-down/across etc can lead to users not understanding how a metric can change when they have merely asked for more information on it. For example where a finance data mart includes a mix of product sales and other sources of revenue, drilling down to "Type of Product" ought to classify non-product sale related data as such, rather than letting those numbers drop out of the report because there is no join from the fact table to the product dimension table.

David Aldridge
+3  A: 

I would lean toward a design that communicates the meaning of that column. A null could mean any number of things as far as the domain is concerned. Putting a value in the related table that says "Not Needed", or "Not Selected" at least communicates the purpose without having to ask a developer or consult a document.

jlembke
But at the expense of having to make sure your reports do not output "Not Selected", if this column drives certain things... Make sure "Not Selected" shows up at the top of the drop down menus, etc.
RSolberg
You many times have to order those menus anyway since you may not be in control of how new values are added. Plus if for some reason you want to see the items that don't have a relationship, you are querying for null instead of a meaningful value.I agree with Peter, null means "I don't know"
jlembke
+3  A: 

If you are assigning NULL to a Business Reason then you are essentially redefining what NULL means in your domain and must document that for users and future developers. If there is a Business Reason for having NULL as a foreign key then I would suggest you do as others have mentioned and add a joining record that has a value of something along the lines of 'N/A' or 'Not Assigned'.

Also there could be complications when NULL in your database now becomes multiple meanings (Business Meaning, Something Error'd or Wasn't inputed correctly) which can cause issues to be more difficult to track down.

Jon Erickson