I typically default during design to NOT NULL
unless a reason is given otherwise - particularly money/decimal columns in accounting - there is usually never an unknown aspect to these. There might be a case where a money column was optional (like a survey or business relationship system where you put your household/business income - this might not be known until/if a relationship is formed by the account manager). For datetime, I would never allow a NULL
RecordCreated column, for instance, while a BirthDate column would allow NULL
.
NOT NULL
columns remove a lot of potential extra code and ensures that users will not have to account for NULL
s with special handling - especially good in presentation layer views or data dictionaries for reporting.
I think it's important during design time to devote a great deal of time handling data types (char vs. varchar, vs. nchar vs. nvarchar, money vs. decimal, int vs. varchar, GUID vs. identity), NULL/NOT NULL, primary key, choice of clustered index and non-clustered indexes and INCLUDE columns. I know that probably sounds like everything in DB design, but if answers to all those questions are understood up front, you will have a much better conceptual model.
Note that even in a database where there are no columns allowed to be NULL
, a LEFT JOIN
in a view can result in a NULL
For a concrete case of the decision process, let's take a simple case of Address1, Address2, Address3, etc all varchar(50) - a pretty common scenario (which might be better represented as a single TEXT column, but let's assume it's modelled this way). I would not allow NULLs and I would default to empty string. The reason for this is:
1) It's not really unknown - it's blank. The nature of UNKNOWN between the multiple columns is never going to be well-defined. It is highly unlikely you would have a KNOWN Address1 and an UNKNOWN Address2 - you either know the whole address or you don't. Unless you are going to have constraints, let them be blank and don't allow NULLs.
2) As soon as people start naively doing things like Address1 + @CRLF + Address2 - NULLs start to NULL
out the entire address! Unless you are going to wrap them in a view with ISNULL
, or change you ANSI NULL settings, why not let them be blank - after all, it's the way they are viewed by users.
I would use probably the same logic for a Middle Name or Middle initial, depending on how it's used - is there a difference between someone without a middle name or someone where it's unknown?
In some cases, I would probably not even allow empty string - and I would do this by constraint. For instance - First and Last Name on a patient, Company Name on a customer. These should never be blank nor empty (or all whitespace or similar). The more of these constraints that are in place, the better your data quality and the sooner you catch silly mistakes like import issues, NULL propagation etc.