views:

265

answers:

9

I have several questions regarding where to handle nulls. Let me set up a scenario. Imagine I have a table that has 5 varchar(50) columns to use as an example when providing reasons for using nulls or empty strings.

  1. Is it better to handle NULLS in code or in the database? By this I mean, is it better to assign an empty string to a varchar(50) if it contains no value or is it better to assign null to the varchar(50) and handle that null in code?

  2. Does assigning an empty string to a column affect performance overhead?

  3. How does using a null vs. an empty string affect indexing?

  4. I am under the impression that if you do not allow your database to contain nulls, you do not have to handle it in code. Is this statement true?

  5. Do other datatypes besides varchars pose the same problems when using a default value or is it more problematic with string datatypes?

  6. What is the overhead of using the ISNULL function if the table contains nulls?

  7. What are other Advantages/Disadvantages?

+2  A: 

The main advantage is that you can handle null and empty strings separately in both the .NET and SQL code - they can, after all, mean different things.

The downside is you need to be careful; in .NET you have to not call obj.SomeMethod() on null, and in SQL you need to watch that nulls tend to propagate when combined (unlike, for example, C# string concatenation).

There isn't really a noticeable size difference between null and empty. In the .NET code I'd hope that it uses the interned empty string, but it isn't going to matter hugely.

Marc Gravell
Marc, can you elaborate a bit more?
Xaisoft
Also, are you saying it the main advantage is to use nulls in the databsase? Just want to clarify.
Xaisoft
On what, exactly? I'm saying that the advantage is that null and blank are different, at both layers. If you mean null, use null. If you mean an empty-string, use an empty string! The two are not the same.
Marc Gravell
Ok, imagine you treat both as the same. Is there a performance advantage/disadvantage? Do you have to check for nulls in the application if you don't accept nulls in the db in the first place?
Xaisoft
A: 

I think a null value and an empty string are two different things both in code and in a database. A variable or field being null means it has no value, but if either is an empty string, it does have a value which happens to be the empty string.

Rian Schmits
+6  A: 

My general advice is to declare fields in a database as NOT NULL unless you have a specific need to allow null values as they tend to be very difficult for people new to databases to handle.

Note that an empty string and a null string field do not necissiarly mean the same thing (unless you define them to). Often null means "unknown" or "not provided", whereas an empty string is just that, a provided and known empty string.

Allowing or disallowing null fields depends entirely on your needs.

Donnie
I disagree with your advice to avoid nulls because people new to databases will get confused. Instead of avoiding nulls, avoid giving access to your production database to people new to databases!
marcc
I agree that you should declares fields as NOT NULL if possible, but I don't agree that you should dumb down your database design to make it easy for people who are new to databases. If NULL is the right choice, use NULL. If some new programmer doesn't understand NULLs, show them the manual.
Mark Byers
+2  A: 

NULL is stored more efficently (NULL bit map) then empty string (2 bytes for varchar length, or "n" for char)

Storage engine blog: Why is the NULL bitmap in a record an optimization?

I've seen some articles that say different, but for char/varchar I've found NULL to be be useful and tend to treat empty string the same as NULL. I've also found the NULL is quicker in queries than empty string too. YMMV of course and I'll evaluate each case on it's own merits.

gbn
A: 

1: Very subjective, as noted by other answers there's a tangible difference between NULL (no answer/unknown) and "" (known to be nothing/not applicable - i.e. a person without a middle name).

2: It shouldn't do.

3: AFAIK (I'm still a junior/learning DBA, so take this with a grain of salt), but there should be no effect.

4: This is arguable. In theory if you apply a NOT NULL constraint to a database field, then you should never have to handle a NULL value. In practice, the gap between theory and practice is smaller in theory than in practice. (In other words, you should probably still handle being given a NULL even if it's theoretically impossible.)

Simon Righarts
Outer join queries make it possible to still have null values returned from queries. Ifnull should be used or null values should be handled in the code.
Jason Down
+2  A: 

You are intermixing an implementation concern with a logical data architecture concern.

You should decide whether or not to allow nulls in a field purely based on whether it accurately models the data you expect to store in the database. Part of the confusion, as a few others have pointed out, is that null and empty strings are not just two ways of storing the same information.

Null means either there is no value or the value is unkown.
Empty string means there is a value and it is an empty string.

Let me demonstrate with an example. Say for example you have a middle name field and need to differentiate between situations where the middle name hasn't been populated and when the person doesn't have a middle name. Use the empty string to indicate that there is no middle name and null to indicate it hasn't been entered.

In almost all cases where a null makes sense in terms of the data it they should be handled in the application code, not the database under the assumption that the DB needs to differentiate between two different states.

The Short Version: Don't pick null vs empty string based on performance/storage concerns in the DB, pick the one that best models the information you are trying to store.

JohnFx
The final comment under "the short version" is worth an upvote all by itself.
Walter Mitty
A: 

I typically default during design to NOT NULL unless a reason is given otherwise - particularly money/decimal columns in accounting - there is usually never an unknown aspect to these. There might be a case where a money column was optional (like a survey or business relationship system where you put your household/business income - this might not be known until/if a relationship is formed by the account manager). For datetime, I would never allow a NULL RecordCreated column, for instance, while a BirthDate column would allow NULL.

NOT NULL columns remove a lot of potential extra code and ensures that users will not have to account for NULLs with special handling - especially good in presentation layer views or data dictionaries for reporting.

I think it's important during design time to devote a great deal of time handling data types (char vs. varchar, vs. nchar vs. nvarchar, money vs. decimal, int vs. varchar, GUID vs. identity), NULL/NOT NULL, primary key, choice of clustered index and non-clustered indexes and INCLUDE columns. I know that probably sounds like everything in DB design, but if answers to all those questions are understood up front, you will have a much better conceptual model.

Note that even in a database where there are no columns allowed to be NULL, a LEFT JOIN in a view can result in a NULL

For a concrete case of the decision process, let's take a simple case of Address1, Address2, Address3, etc all varchar(50) - a pretty common scenario (which might be better represented as a single TEXT column, but let's assume it's modelled this way). I would not allow NULLs and I would default to empty string. The reason for this is:

1) It's not really unknown - it's blank. The nature of UNKNOWN between the multiple columns is never going to be well-defined. It is highly unlikely you would have a KNOWN Address1 and an UNKNOWN Address2 - you either know the whole address or you don't. Unless you are going to have constraints, let them be blank and don't allow NULLs.

2) As soon as people start naively doing things like Address1 + @CRLF + Address2 - NULLs start to NULL out the entire address! Unless you are going to wrap them in a view with ISNULL, or change you ANSI NULL settings, why not let them be blank - after all, it's the way they are viewed by users.

I would use probably the same logic for a Middle Name or Middle initial, depending on how it's used - is there a difference between someone without a middle name or someone where it's unknown?

In some cases, I would probably not even allow empty string - and I would do this by constraint. For instance - First and Last Name on a patient, Company Name on a customer. These should never be blank nor empty (or all whitespace or similar). The more of these constraints that are in place, the better your data quality and the sooner you catch silly mistakes like import issues, NULL propagation etc.

Cade Roux
What is null propagation?
Xaisoft
NULL propagation is when a NULL is used in an expression (without wrapper logic) - the expression results in NULL. This can be subtle. For instance SUM(t.col1 + t.col2) is not the same as SUM(t.col1) + SUM(t.col2) when there are rows where only col1 or col2 are NULL. So if NULLs are allowed, you need to wrap them to produce defaults or otherwise account for them appropriately to your application requirements. i.e. in application terms, just because something is UNKNOWN in database terms DOES NOT MEAN that the result in application terms is UNKNOWN.
Cade Roux
A: 

Putting faked up data (Empty string for string data, 0 for numbers, some riduculaous date for dates) instead of null in a database is almost always a poor choice. Those faked up values do not mean the same thing and especially for numeric data, it is hard to get a faked value that isn't the same as a real value. And when you put in bad data you still have to write code around it to make sure things are handled correctly (such as not returning those records which don;t have an end date) so you actually save nothing onthe development side.

If you cannot know the data at the time the record is inserted null is the best choice. That said, if the data will alawys be known use not null wherever possible.

HLGEM
A: 

You should look into sixth normal form. 6NF was specifically invented to get rid of the problems introduced by the use of NULLS. A lot of those problems are made worse by SQL's three valued logic (true, false, unknown), and the programmer's common use of two valued logic.

In 6NF, every time a row/column intersection would have to be flagged as NULL, the situation can be handled by simply omitting the row.

However, I generally do not try for 6NF in database design. Most of the time, NULLable columns are not used as part of search criteria or join criteria, and the problems with NULLS don't surface.

Walter Mitty