There's a school of thought that null values should not be allowed in a relational database. That is, a table's attribute (column) should not allow null values. Coming from a software development background, I really don't understand this. It seems that if null is valid within the context of the attribute, then it should be allowed. This is very common in Java where object references are often null. Not having an extensive database experience, I wonder if I'm missing something here.
Nulls are negatively viewed from the perspective of database normalization. The idea being that if a value can be nothing, then you really should split that out into another sparse table such that you don't require rows for items which have no value.
It's an effort to make sure all data is valid and valued.
In some cases having a null field is useful, though, especially when you want to avoid yet another join for performance reasons (although this shouldn't be an issue if the database engine is setup properly, except in extraordinary high performance scenarios.)
There is nothing wrong with using NULL for data fields. You have to be careful when setting keys to null. Primary keys should never be NULL. Foreign keys can be null but you have to be careful not to create orphan records.
If something is "non existent" then you should use NULL instead of an empty string or other kind of flag.
According to strict relational algebra, nulls are not needed. However for any practical project, they are needed.
First, much real-world data is unknown or not applicable and nulls implement that behavior well. Second, they make views and outer joins much more practical.
It depends.
As long as you understand why you are allowing NULL
s in the database (the choice needs to be made on a per-column basis) AND how you will interpret, ignore or otherwise deal with them, they are fine.
For instance, a column like NUM_CHILDREN
- what do you do if you don't know the answer - it should be NULL
. In my mind, there is no other best option for this column's design (even if you have a flag to determine whether the NUM_CHILDREN
column is valid, you still have to have a value in this column).
On the other hand, if you don't allow NULL
s and have special reserved values for certain cases (instead of flags), like -1 for number of children when it is really unknown, you have to address these in a similar way, in terms of conventions, documentation, etc.
So, ultimately, the issues have to be addressed with conventions, documentation and consistency.
The alternative, as apparently espoused by Adam Davis in the above answer, of normalizing the columns out to sparse (or not so sparse, in the case of the NUM_CHILDREN
example or any example where most of the data has known values) tables, while able to eliminate all NULLs, is non-workable in general practice.
In many cases where an attribute is unknown, it makes little sense to join to another table for each and every column which could allow NULL
s in a simpler design. The overhead of joins, the space requirements for theprimary keys make little sense in the real world.
This brings to mind the way duplicate rows can be eliminated by adding a cardinality column, while it theoretically solves the problem of not having a unique key, in practice that is sometimes impossible - for instance, in large scale data. The purists are then quick to suggest a surrogate PK instead, yet the idea that a meaningless surrogate can form part of a tuple (row) in a relation (table) is laughable from the point of view of the relational theory.
You'll find with step-by-step data acquisition systems that you can't avoid having nulls in a database because the order of asking questions / data gathering very rarely matches the logical data model.
Or you can default the values (requiring code to handle these default values). You can assume all strings are empty instead of null, for example, in your model.
Or you can have staging database tables for data acquisition that continues until all the data is obtained before you populate the actual database tables. This is a lot of extra work.
I would say that Nulls should definitely be used. There is no other right way to represent lack of data. For example, it would be wrong to use an empty string to represent a missing address line, or it would be wrong to use 0 to represent a missing age data item. Because both an empty string and 0 are data. Null is the best way to represent such a scenario.
One argument against nulls is that they don't have a well-defined interpretation. If a field is null, that could be interpreted as any of the following:
- The value is "Nothing" or "Empty set"
- There is no value that makes sense for that field.
- The value is unknown.
- The value hasn't been entered yet.
- The value is an empty string (for databases that don't distinguish between nulls and empty strings).
- Some application-specific meaning (e.g., "If the value is null, then look at another table to find the value.")
- An error has occurred, causing the field to have a null value when it really shouldn't.
Some schema designers demand that all values and data types should have well-defined interpretations, therefore nulls are bad.
To a database, null translates to "I don't have a value for this". Which means that (interestingly), a boolean column that allows nulls is perfectly acceptable, and appears in many database schemas. In contrast, if you have a boolean in your code that can have a value of 'true', 'false' or 'undefined', you're likely to see your code wind up on thedailywtf sooner or later :)
So yes, if you need to allow for the possibility of a field not having any value at all, then allowing nulls on the column is perfectly acceptable. It's significantly better than the potential alternatives (empty strings, zero, etc)
Best thing to know about Normal Forms is that they are guides and guides should not be doggedly adhered to. When the world of academia clashes with the actual world you seldom find many surviving warriors of acedemia.
The answer to this question is that its ok to use nulls. Just evaluate your situation and decide if you want them to show up in the table or collapse the data into another related table if you feel you ratio of null values to actual values is too high.
As a friend is fond of saying, "Don't let the perfect be the enemy of the good". Think Voltaire also said that. 8)
Personally, I think that nulls should only be used when you are using the field as a foreign key to another table, to symbolize that this record doesn't link to anything in the other table. Other than that, I find that null values are actually very troublesome when programming application logic. Because there is no direct representation of a database null in most programming languages for many data types, it ends up creating a lot of application code to deal with the meaning of these null values. When a DB encounters null integer, and tries, for instance, add a value of 1 to it (aka null + 1), the database will return null, as that is how the logic is defined. However, when a programming language tries to add null and 1, it will usually thrown an exception. So, your code ends up littered with checks of what to do when the value is null, which often just equates to converting to 0 for numbers, empty string for text, and some null date (1900/1/1?) for date fields.
This is a huge can of worms, because NULL can mean so many things:
- No date of death because the person is still alive.
- No cell phone number because we don't know what it is or even if it exists.
- No social security number because that person is know to not have one.
Some of these can be avoided by normalisation, some of them can be avoided by the presence of a value in that column ("N/A"), some of them can be mitigated by having a separate column to explain the presence of the NULL ("N/K", "N/A" etc).
It's also a can of worms because the SQL syntax needed to find them is different to that of non-null values, it's difficult to join on them, and they are generally not included in index entries.
Because of the former reason you're going to find cases where a null is unavoidable.
Because of the latter reason you should still do your best to minimise the number of them.
Regardless, always use NOT NULL constraints to guard against nulls where a value is required.
Instead of writing up all the issues of NULL, and tristate vs boolean logic, etc. - I'll offer this pithy advice:
Don't allow NULL in your columns, until you find yourself adding a magic value to represent missing or incomplete data.
Since you're asking this question, you should be very careful in how you approach NULL. There's a lot of nonobvious pitfalls to it. When in doubt, don't use NULL.
Nulls can be hard to work with, but they make sense in some cases.
Suppose you have an invoice table with a column "PaidDate" which has a date value. What do you put in that column before the invoice has been paid (assuming you don't know beforehand when it will be paid)? It can't be an empty string, because that's not a valid date. It doesn't make sense to give it an arbitrary date (e.g. 1/1/1900) because that date simply isn't correct. It seems the only reasonable value is NULL, because it does not have a value.
Working with nulls in a database has a few challenges, but databases handle them well. The real problems are when you load nulls from your database into your application code. That's where I've found that things are more difficult. For example, in .NET, a date in a strongly typed dataset (mimicking your DB structure) is a value type and cannot be null. So you have to build workarounds.
Avoid nulls when you can, but don't rule them out because they have valid uses.
There is another alternative to using "N/A" or "N/K" or the empty string - a separate table.
E.g. if we may or may not know a customer's phone number:
CREATE TABLE Customer (ID int PRIMARY KEY, Name varchar(100) NOT NULL, Address varchar(200) NOT NULL);
CREATE TABLE CustomerPhone (ID int PRIMARY KEY, Phone varchar(20) NOT NULL, CONSTRAINT FK_CustomerPhone_Customer FOREIGN KEY (ID) REFERENCES Customer (ID));
If we don't know the phone number we just don't add a row to the second table.
One gotcha if you are using an Oracle database. If you save an empty string to a CHAR type column then Oracle will coerce the value to be NULL without asking. So it can be quite difficult to avoid NULL values in string columns in Oracle.
If you are using NULL values, learn to use the SQL command COALESCE, especially with string values. You can then prevent NULL values propogating into your programming language. For example, imagine a person having a FirstName, MiddleName and FamilyName but you want to return a single field;
SELECT FullName = COALESCE(FirstName + ' ', '') + COALESCE(MiddleName+ ' ', '') + COALESCE(FamilyName, '') FROM Person
If you don't use COALESCE, if any column contains a NULL value you get NULL returned.
I think you're confusing Conceptual Data Modeling with Physical Data Modeling.
In CDM's if an object has an optional field, you should subtype the object and create a new object for when that field is not null. That's the theory in CDMs
In the physical world we make all sorts of compromises for the real world. In the real world NULLS are more than fine, they are essential
Related question: How do I enforce data integrity rules in my database?
I initially started with many small tables with almost zero nullalbe fields. Then I learned about the LINQ to SQL IsDiscriminator property and that LINQ to SQL only supports single table inheritance. Therefore I re-engineered it as a single table with lots of nullalbe fields.
Technically, nulls are illegal in relational math on which the relational database is based. So from a purely technical, semantic relational model point of view, no, they are not okay.
In the real world, denormalization and some violations of the model are okay. But, in general, nulls are an indicator that you should look at your overall design more closely.
I am always very wary of nulls and try to normalize them out whenever I can. But that doesn't mean that they aren't the best choice sometimes. But I would definitely lean to the side of "no nulls" unless you are really sure that having the nulls is better in your particular base.
It all comes down to normalization versus ease of use and performance issues.
If you are going to stick to complete normalization rules you are going to end up writing stuff that looks like:
Select c.id, c.lastname,....... from customer c left join customerphonenumber cpn on c.id = cpn.customerid left join customeraddress ca on c.id = ca.customerid left join customerphonenumber2 cpn2 on c.id = cpn2.customerid etc, etc, etc
I think the question comes down to what you interpret a value of NULL to signify. Yes, there are many interpretations for a NULL value, however some of them posted here should never be used. The true meaning of NULL is determined by the context of your application and should never mean more than one thing. For example, one suggestion was that NULL on a date of birth field would indicate the person was still alive. This is dangerous.
In all simplicity, define NULL and stick to it. I use it to mean "the value in this field is unknown at this time". It means that and ONLY that. If you need it to mean something else AS WELL, then you need to re-examine your data model.
I agree with many of the answers above and also believe that NULL can be used, where appropriate, in a normalized schema design - particularly where you may wish to avoid using some kind of "magic number" or default value which, in turn, could be misleading!
Ultimately though, I think usage of null needs to be well thought out (rather than by default) to avoid some of the assuptions listed in the answers above, particularly where NULL might be assumed to be 'nothing' or 'empty', 'unknown' or the 'value hasn't been entered yet'.
NULL rocks. If it wasn't necessary in some cases, SQL would not have IS NULL and IS NOT NULL as special-case operators. NULL is the root of the conceptual universal, all else is NOT NULL. Use NULLs freely, whenever it may be possible for a data value to be absent but not missed. Default values can only compensate for NULL if they are absolutely correct all of the time. For example, if i have a single-bit field "IsReady" it may make perfect sense for this field to have a default value of false and NULL not be allowed, but this implicitly asserts that we know that the whatever is not ready, when in fact we may have no such knowledge. Chances are, in a workflow scenario, the person who decides ready-or-not just hasn't had the chance to enter their opinion yet, so a default of false could actually be dangerous, leading them to overlook a decision that appears to have been made but was in fact only defaulted.
as an aside, and in reference to the middle-initial example, my father had no middle name, therefore his middle initial would be NULL - not blank, space, or asterisk - except in the Army where his middle initial was NMI = No Middle Initial. How silly was that?
There are several different objections to the use of NULL. Some of the objections are based on database theory. In theory, there is no difference between theory and practice. In practice, there is.
It is true that a fully normalized database can get along without NULLS at all. Any place where a data value has to be left out is a place where an entire row can be left out with no loss of information.
In practice, decomposing tables to this extent serves no great useful purpose, and the programming needed to perform simple CRUD operations on the database become more tedious and error prone, rather than less.
There are places where the use of NULLS can cause problems: essentially these revolve around the following question: what does missing data really mean? All a NULL really conveys is that there is no value stored in a given field. But the inferences application programmers draw from missing data are sometimes incorrect, and that causes a lot of problems.
Data can be missing from a location for a variety of reasons. Here are a few:
The data is inapplicable in this context. e.g. spouse's first name for a single person.
The user of a data entry form left a field blank, and the application does not require an entry in the field.
The data is copied to the database from some other database or file, and there was missing data in the source.
There is an optional relationship encoded in a foreign key.
An empty string was stored in an Oracle database.
Here are some guidelines about when to avoid NULLS:
If in the course of normal expected programming, query writers have to write a lot of ISNULL, NV, COALESCE, or similar code in order to substitute a valid value for the NULL. Sometimes, it's better to make the substitution at store time, provided what's being stored is "reality".
If counts are likely to be off because rows containing a NULL were counted. Often, this can be obviated by just selecting count(MyField) instead of count(*).
Here is one place where you by golly better get used to NULLS, and program accordingly: whenever you start using outer joins, like LEFT JOIN and RIGHT JOIN. The whole point behind an outer join as distinct from an inner join is to get rows when some matching data is missing. The missing data will be given as NULLS.
My bottom line: don't dismiss theory without understanding it. But learn when to depart from theory as well as how to follow it.
While technically NULLs are ok as a field value, they are quite frequently frowned upon. Depending on how data is written to your database, it is possible (and common) to end up with an empty string value in the field as opposed to a NULL. So, any query that has this field as part of the WHERE clause, would need to handle both scenarios which are needless keystrokes.
null means no value while 0 doesn't, if you see a 0 you don't know the meaning, if you see a null you know it is a missing value
I think nulls are much clearer, 0 and '' are confusing since they don't clearly show the intent of the value stored
The main issue with nulls is that they have special semantics that can produce unexpected results with comparisons, aggregates and joins.
Nothing is ever equal to null, and nothing is ever not equal to, greater than or less than null, so you have to set nulls to a placeholder value if you want do any bulk comparison.
This is also a problem on composite keys that might be used in a join. Where the natural key includes a nullable column you might want to consider using a synthetic key.
Nulls can drop out of counts, which may not be the semantics you desire.
Nulls in a column that you can join against will eliminate rows from an inner join. In general this is probably desired behaviour, but it can lay elephant traps for people doing reporting.
There are quite a few other subtleties to nulls. Joe Celko's SQL for Smarties has a whole chapter on the subject and is a good book and worth reading anyway. Some examples of places where nulls are a good solution are:
Optional relationships where a joined entity may or may not be present. Null is the only way to represent an optional relationship on a foreign key column.
Columns that you may wish to use to null to drop out of counts.
Optional numeric (e.g. currency) values that may or may not be present. There is no effective placeholder value for 'not recorded' in number systems (particularly where zero is a legal value), so null is really the only good way to do this.
Some examples of places where you might want to avoid using nulls because they are likely to cause subtle bugs.
'Not Recorded' values on code fields with a FK against a reference table. Use a placeholder value, so you (or some random business analyst down the track) don't inadvertently drop rows out of result sets when doing a query against the database.
Description fields where nothing has been entered - null string (
''
) works fine for this. This saves having to treat the nulls as a special case.Optional columns on a reporting or data warehouse system. For this situation, make a placeholder row for 'Not Recorded' in the dimension and join against that. This simplifies querying and plays nicely with ad-hoc reporting tools.
Again, Celko's book is a good treatment of the subject.
Don't underestimate the complexity you create by making a field NULLable. For example, the following where clause looks like it will match all rows (bits can only be 1 or 0, right?)
where bitfield in (1,0)
But if the bitfield is NULLable, it will miss some. Or take the following query:
select * from mytable
where id not in (select id from excludetable)
Now if the excludetable contains a null and a 1, this translates to:
select * from mytable
where id <> NULL and id <> 1
But "id <> NULL" is false for any value of id, so this will never return any rows. This catches even experienced database developers by surpise.
Given that most people can be caught off-guard by NULL, I try to avoid it when I can.
It seems that if null is valid within the context of the attribute, then it should be allowed.
But what does null mean? That's the rub. It's "no value", but there's a dozen different reasons there might be no value there, and "null" doesn't give you any clue which one it means in this case. (Not set yet, not applicable to this instance, not applicable to this type, not known, not knowable, not found, error, program bug, ...)
This is very common in Java where object references are often null.
There's a school of thought that says null references there are bad there, too. Same problem: what does null mean?
IIRC, Java has both "null" and "uninitialized" (though no syntax for the latter). So Gosling realized the folly of using "null" for every kind of "no value". But why stop with just two?
As an analyst/programmer with 30 years experience I'll just say NULLs should be taken out back and put out of their misery.
-1, 01/01/0001/12/31/9999 and ? will all suffice just as well without the mind distorting code needed to cope with these nasty NULLs.
Dont take my words sarcastic, I mean it. Unless you are working with toy databases NULLs are inevitable and in realworld we cannot avoid NULL values.
Just for saying how can you have first name, middle name, last name for every person. (Middle name and Last name is optional, then in that case NULLs are there for you) and how you can have Fax,Business phone,Office phone for everybody in the blog list.
NULLS are fine, and you have to handle them properly when retrieval. In SQL server 2008 there is a concept of Sparse columns where you can avoid the space taken for NULLs also.
Dont confuse NULLs with Zeros and any other value. People do that any say it is right.
Thanks Naveen