views:

428

answers:

12

What are the pros and cons of using NULL values in SQL as opposed to default values?

PS. Many similar questions has been asked on here but none answer my question.

+4  A: 

To me, they are somewhat orthogonal.

Default values allow you to gracefully evolve your database schema (think adding columns) without having to modify client code. Plus, they save some typing, but relying on default values for this is IMO bad.

Nulls are just that: nulls. Missing value and a huge PITA when dealing with Three-Valued Logic.

Anton Gogolev
a missing value is a value in and of itself... there are plenty of use cases where "no value" carries specific meaning, substituting "magic values" (like -99999) in place of null doesn't simplify anything; either the consuming code has to check "if X.HasValue()" or "if X == -99999".
STW
+7  A: 

A NULL value in databases is a system value that takes up one byte of storage and indicates that a value is not present as opposed to a space or zero or any other default value. The field in a database containing the NULL value means that the content of this cell is unknown at the time of looking at it. A column that allows NULL values also allows rows to be inserted with no values at all in that column. There are several pros and cons of using NULL values as opposed to default values:

Pros

NULL value does not have the data type, therefore can be inserted to any data structure and any database column. Default values, on the other hand, need to have their data type specified and a default value in one column might look the same in another column, but it might be of a different type.

NULL is often used in schemas where a value is optional. It is a convenient method for omitting data entry for unknown fields without having to implement additional rules, like storing negative values in an integer field to represent omitted data.

Since the NULL value takes up only 1 bit of memory space, they may be useful when optimising the database. Using those values is much more efficient than default values, e.g. character’s 8 bits and integer’s 16bits.

While your system requirements may change over time and the default value types with them, NULL value is always NULL so there is no need to update the type of data.

Assigning Not Null to table schemas can also help with table validation, in a sense that the column with Not Null criteria will require a value to be inserted. Default values do not have these capabilities.

Cons

NULL values are easily confused with empty character strings, which return a blank value to the user when selected. In this sense, default values are less confusing and are the safer option, unless the default value is set to the empty string.

If NULL values are allowed in the database, they may cause the designer some extra time and work as they can make the database logic more complicated, especially when there are a lot of comparisons to null values in place.

Source: Pro and cons

R van Rijn
+10  A: 

I don't know why you're even trying to compare these to cases. null means that some column is empty/has no value, while default value gives a column some value when we don't set it directly in query.

Maybe some example will be better explanation. Let's say we've member table. Each member has an ID and username. Optional he might has an e-mail address (but he doesn't have to). Also each member has a postCount column (which is increased every time user write a post). So e-mail column can has a null value (because e-mail is optional), while postCount column is NOT NULL but has default value 0 (because when we create a new member he doesn't have any posts).

Crozin
Because I don't fully understand concept of using these two, thank you.
Registered User
+4  A: 

NULL values are meant to indicate that the attribute is either not applicable or unknown. There are religious wars fought over whether they're a good thing or a bad thing but I fall in the "good thing" camp.

They are often necessary to distinguish known values from unknown values in many situations and they make a sentinel value unnecessary for those attributes that don't have a suitable default value.

For example, whilst the default value for a bank balance may be zero, what is the default value for a mobile phone number. You may need to distinguish between "customer has no mobile phone" and "customer's mobile number is not (yet) known" in which case a blank column won't do (and having an extra column to decide whether that column is one or the other is not a good idea).

Default values are simply what the DBMS will put in a column if you don't explicitly specify it.

paxdiablo
000-000-0000 or 555-555-5555 or any other invalid phone number is a good default phone number, anything you can test against is just as good as testing against NULL in theory but much easier in practice.
fuzzy lollipop
I disagree, fuzzy. What you are using is a sentinel, a fake real value to indicate metadata about the field. There are cases where all possible values are valid and none can be used as a sentinel. In addition, it's no more difficult to put "is null" in your queries than "= '000-000-0000'" (and usually more space efficient to store the null) so I'm not sure what trouble you have with NULL that makes it harder.
paxdiablo
+6  A: 

Null values are not ... values!

Null means 'has no value' ... beside the database aspect, one important dimension of non valued variables or fields is that it is not possible to use '=' (or '>', '<'), when comparing variables.

Writting something like (VB):

if myFirstValue = mySecondValue

will not return either True or False if one or both of the variables are non-valued. You will have to use a 'turnaround' such as:

if (isnull(myFirstValue) and isNull(mySecondValue)) or myFirstValue = mySecondValue

The 'usual' code used in such circumstances is

if Nz(myFirstValue) = Nz(mySecondValue, defaultValue)

Is not strictly correct, as non-valued variables will be considered as 'equal' to the 'defaultValue' value (usually Zero-length string).

In spite of this unpleasant behaviour, never never never turn on your default values to zero-length string (or '0's) without a valuable reason, and easing value comparison in code is not a valuable reason.

Philippe Grondier
+2  A: 

As with many things, there are good and bad points to each.

Good points about default values: they give you the ability to set a column to a known value if no other value is given. For example, when creating BOOLEAN columns I commonly give the column a default value (TRUE or FALSE, whatever is appropriate) and make the column NOT NULL. In this way I can be confident that the column will have a value, and it'll be set appropriate.

Bad points about default values: not everything has a default value.

Good things about NULLs: not everything has a known value at all times. For example, when creating a new row representing a person I may not have values for all the columns - let's say I know their name but not their birth date. It's not appropriate to put in a default value for the birth date - people don't like getting birthday cards on January 1st (if that's the default) if their birthday is actually July 22nd.

Bad things about NULLs: NULLs require careful handling. In most databases built on the relational model as commonly implemented NULLs are poison - the presence of a NULL in a calculation causes the result of the calculation to be NULL. NULLs used in comparisons can also cause unexpected results because any comparison with NULL returns UNKNOWN (which is neither TRUE nor FALSE). For example, consider the following PL/SQL script:

declare 
  nValue NUMBER;
begin
  IF nValue > 0 THEN
    dbms_output.put_line('nValue > 0');
  ELSE
    dbms_output.put_line('nValue <= 0');
  END IF;

  IF nValue <= 0 THEN
    dbms_output.put_line('nValue <= 0');
  ELSE
    dbms_output.put_line('nValue > 0');
  END IF;
end;

The output of the above is:

nValue <= 0
nValue > 0

This may be a little surprising. You have a NUMBER (nValue) which is both less than or equal to zero and greater than zero, at least according to this code. The reason this happens is that nValue is actually NULL, and all comparisons with NULL result in UNKNOWN instead of TRUE or FALSE. This can result in subtle bugs which are hard to figure out.

Share and enjoy.

Bob Jarvis
+2  A: 

It depends on the situation, but it's really ultimately simple. Which one is closer to the truth?

A lot of people deal with data as though it's just data, and truth doesn't matter. However, whenever you talk to the stakeholders in the data, you find that truth always matters. sometimes more, sometimes less, but it always matters.

A default value is useful when you may presume that if the user (or other data source) had provided a value, the value would have been the default. If this presumption does more harm then good, then NULL is better, even though dealing with NULL is a pain in SQL.

Note that there are three different ways default values can be implemented. First, in the application, before inserting new data. The database never sees the difference between a default value provided by the user or one provided by the app!

Second, by declaring a default value for the column, and leaving the data missing in an insert.

Third, by substituting the default value at retrieval time, whenever a NULL is detected. Only a few DBMS products permit this third mode to be declared in the database.

In an ideal world, data is never missing. If you are developing for the real world, required data will eventually be missing. Your applications can either do something that makes sense or something that doesn't make sense when that happens.

Walter Mitty
+1  A: 

Nulls and default values are differnt things used for differnt purposes. If you are trying to avoid using nulls by giving everything a default value, that is a poor practice as I will explain.

Null means we do not know what the value is or will be. For instance suppose you have an enddate field. You don't know when the process being recorded will end, so null is the only appropriate value, using a default value of some fake date way out in the future will cause as much trouble to program around as handling the nulls and is more likely in my experience to create a problem with incorrect results being returned.

Now there are times when we might know what the value should beif the person insrting the record does not. For instnce if you have a data inserted field, it is appropraite to havea default value of the current date and not expect the user to fill this in. You are likely to actually have better information that way for this field.

Sometimes, it's a judgement call and depends onthe business rules you have to apply. Suppose you have a speaker honoraria field (Which is the amount a speaker woudl get paid). A default value of 0 could be dangerous as it it might mean that speakers are hired and we intend to pay them nothing. It is also possible that ther may occasionally be speakers who are donating thier time for a particular project (or who are employees of the comapny and thus not paid extra to speak) where zero is a correct value, so you can't use zero as the value to determine that you don't know how much this speaker is to be paid. In this case Null is the only appropraiate value and the code should trigger an issue if someone tries to add the speaker to a conference. In a differnt situation, you may know already that tha minimum any speaker will be paid is 3000 and that only speakers who have negotiated a differnt rate will have data entered inthe honoraria field. In this case, it is appropriate to put in a default value of 3000. In another cases, diffeernt clients may have differnt minimums, so the default should be handled differntly (usually throuhg a loopup table that automatically populates the minimum honoraria value for that client on the data entry form.

So I feel the best rule is leave the value as null if you truly cannot know at the time the data is entered what the value of the field should be. Use a default value only it is has meaning all the time for that particular situation and use some other techinque to fill in the value if it could be different under different circumstances.

HLGEM
+2  A: 

In a Data Warehouse, you would always want to have default values rather than NULLs.

Instead you would have value such as "unknown","not ready","missing"

This allows INNER JOINs to be performed efficiently on the Fact and Dimension tables as 'everything always has a value'

adolf garlic
A: 

As one responder already said, NULL is not a value.

Be very ware of anything proclaimed by anyone who speaks of "the NULL value" as if it were a value.

NULL is not equal to itself. x=y yields false if both x and y are NULL. x=y yields true if both x and y are the default value.

There are almost endless consequences to this seemingly very simple difference. And most of those consequences are booby traps that bite you real bad.

Erwin Smout
A: 

Two very good Access-oriented articles about Nulls by Allen Browne:

Aspects of working with Nulls in VBA code:

The articles are Access-oriented, but could be valuable to those using any database, particularly relative novices because of the conversational style of the writing.

David-W-Fenton
A: 

Nulls NEVER save storage space in DB2 for OS/390 and z/OS. Every nullable column requires one additional byte of storage for the null indicator. So, a CHAR(10) column that is nullable will require 11 bytes of storage per row – 10 for the data and 1 for the null indicator. This is the case regardless of whether the column is set to null or not.

DB2 for Linux, Unix, and Windows has a compression option that allows columns set to null to save space. Using this option causes DB2 to eliminate the unused space from a row where columns are set to null. This option is not available on the mainframe, though.

REF: http://www.craigsmullins.com/bp7.htm

So, the best modeling practice for DB2 Z/OS is to use "NOT NULL WITH DEFAULT" as a standard for all columns. It's the same followed in some major shops I knew. Makes the life of programmers more easier not having to handle the Null Indicator and actually saves on storage by eliminating the need to use the extra byte for the NULL INDICATOR.

Prasad Alla, Data Architect

Prasad Alla