tags:

views:

747

answers:

16

I'm just stepping into a project and it has a fairly large database backend. I've started digging through this database and 95% of the fields are nullable.

Is this normal practice in the database world? I'm just a lowly programmer, not a DBA but I would think you would want to keep nullable fields to a minimum, only where they make sense.

Is it a "code smell" if most columns are nullable?

+1  A: 

As a best practice, if a column shouldn't be nullable, then it should be marked as such. However, I don't believe in going completely insane with things like this.

Cody C
+2  A: 

I think so. If you don't need the data, then it's not important to your business. If it is important to your business, it should be required.

Chris McCall
Sure, a credit card number is required before a user can buy anything (for example) but they should still be allowed to save other attributes, and then add the credit card number later. If you block them from entering *any* data because they don't have the required fields, that's just going to make them upset.
Bill Karwin
Which is why those things don't belong in the same table, not why credit card number should be nullable in the table of credit card orders, right?
Doug McClean
It's just an example. The point is that there may be attributes of a given table that are important for your business, but not for data integrity.
Bill Karwin
I've yet to see a working example where this is true and not a design artifact.
Chris McCall
+11  A: 

Default values are typically the exception and NULLs are the norm, in my experience.

True, nulls are annoying.

It's also extremely useful because null is the best indicator of "NO VALUE". A concrete default value is very misleading, and you can lose information or introduce confusion down the road.

hythlodayr
+7  A: 

Don't know if I consider it always a bad thing, but if the columns are being added because a single record (or maybe a few) need to have values while most don't, then it indicates a pretty flat table structure. If you're seeing column names like "addr1", "addr2", "addr3", then it stinks!

I would bet that most of the columns you have could be removed and represented in other tables. You could find the "non-null" ones through a foreign key relationship. This will increase the joins that you'll be doing, but it could be more preformant that doing a "where not col1 is null".

Todd R
How would you store the various lines of an address other than columns named addr1, addr2, addr3? (OR were you referring 3 separate complete addresses?) Addresses are one of the standard example cases for nulls. Some addresses have 2 lines, some have 6.
jmucchiello
I interpreted Addr1 through AddrX to be placeholders for mailingAddress, physicalAddress, workAddress, xmasAddress, etc. Otherwise, it'd be AddrLine1, AddrLine2.
p.campbell
Yeah, maybe address was a bad example - probably should have used phone numbers. What typically shows up (in bad schemas) is "homeaddr", "workaddr", "vacationaddr", "otheraddr", "otheraddr2", etc., all because one record needed "workaddr", another needed "vacationaddr" (without "workaddr"), and so on. AddrLine1 and AddrLine2 are fine.
Todd R
+1  A: 

This is all completely dependent on the scope and requirements of the project. I wouldn't use number of nullable fields alone as a metric for poorly written or designed code. Have a look at the business domain, if there are many non nullable fields represented there that are nullable in the database, then you have some issues.

Matthew Vines
+2  A: 

No, whether or not a field should be nullable is a data concept and can't be a code smell. Whether or not NULLs are annoying to code has nothing to do with the usefulness of having nullable data fields.

jmucchiello
+3  A: 

They are a (very common) smell, I'm afraid. Look up C.J. Date writings on the topic.

Bruno Martinez
Really? C. J. Date thinks that NULLs are not a legitimate part of the relational model, and even if they were, are implemented wrong in SQL. So his writings on this topic might be considered to be an extreme opinion.
Bill Karwin
They might be considered an extreme opinion if it weren't for the fact that he's right, and has a pretty much airtight case for why he is right (at least as far as the "[nulls] are implemented wrong in SQL" part goes). A type generator like Optional or Maybe (or whatever you want to call it) is a useful thing, but when Missing = Missing evaluates to a magical third boolean called Unknown with all kinds of wacky and inconsistently properties, that's a problem for everyone (including the optimizer).
Doug McClean
+10  A: 

Anyone who has developed a data entry application knows how common it is for some of the fields to be unknown at the time of entry -- even for columns that are business-critical, to address @Chris McCall's answer.

However, a "code smell" is merely an indicator that something might be coded in a sloppy way. You use smells to identify things that need more investigation, not necessarily things that must be changed.

So yes, if you see nullable columns so consistently, you're right to be suspicious. It might indicate that someone was being lazy, or afraid to declare NOT NULL columns unequivocally. You can justify doing your own analysis.

Bill Karwin
+4  A: 

I think nullable columns should be avoided. Wherever the semantics of the domain make it possible to use a value that clearly indicates missing data, it should be used instead of NULL.

For instance, let's imagine a table that contains a Comment field. Most developers would place a NULL here to indicate that there's no data in the column. (And, hopefully, a check constraint that disallows zero-length strings so that we have a well-known "value" to indicate the lack of a value.) My approach is usually the opposite. The Comment column is NOT NULL and a zero-length string indicates the lack of a value. (I use a check constraint to ensure that the zero-length string is really a zero-length string, and not whitespace.)

So, why would I do this? Two reasons:

  1. NULLs require special logic in SQL, and this technique avoids that.
  2. Many client-side libraries have special values to indicate NULL. For instance, if you use Microsoft's ADO.NET, the constant DBNull.Value indicates a NULL, and you have to test for that. Using a zero-length string on a NOT NULL column obviates the need.

Despite all of this, there are many circumstances in which NULLs are fine. In fact, I have no objection to their use in the scenario above, although it wouldn't be my preferred way.

Whatever you do, be kind to those who will use your tables. Be consistent. Allow them to SELECT with confidence. Let me explain what I mean by this. I recently worked on a project whose database was not designed by me. Nearly every column was nullable and had no constraints. There was no consistency about what represented the absence of a value. It could be NULL, a zero-length string, or even a bunch of spaces, and often was. (How that soup of values got there, I don't know.)

Imagine the ugly code a developer has to write to find all of those records with a missing Comment field in this scenario:

SELECT * FROM Foo WHERE LEN(ISNULL(Comment, '')) = 0

Amazingly there are developers who regard this as perfectly acceptable, even normal, despite possible performance implications. Better would be:

SELECT * FROM Foo WHERE Comment IS NULL

Or

SELECT * FROM Foo WHERE Comment = ''

If your table is properly designed, the above two SQL statements can be relied upon to produce quality data.

Gregory Higley
I have to disagree. NULL means unknown, regardless of the data type of the column. It should always be used to mean unknown, and magic values like empty string should never be used to mean unknown.
John Saunders
On the other hand, if we know that the user elected not to leave a comment, why would we use something that means "unknown" to represent that knowledge?
Joel Mueller
@john-saunders It depends on the domain. A zero-length string in a comment field can represent a very clearly-defined, extremely well-known value: "no comment". This is quite different from "unknown".However, these are semantic games that even Dr. Codd played. He later came up with several alternatives to NULL to indicate things like unknown, missing, etc.The important thing here is to *be consistent*.
Gregory Higley
Because the value of the comment is unknown.
John Saunders
This has nothing to do with the problem domain. If the user left a zero-length comment, then it's a zero-length comment. If no comment was left, then there is no known value for the comment. If the solution chooses to treat these two separate concepts the same, then that's fine, but they're still two separate concepts.
John Saunders
Perhaps the problem here is that I foolishly (and loosely) used the phrase "missing data" in my first paragraph, which is not exactly what I meant. It's more nuanced than that, and again, it depends upon the semantics of the domain, but it's something I try not to get too pedantic about.I should stress that I *do* use nullable columns. There are many situations in which they are completely appropriate. It depends on the meaning and type of the data you are storing.
Gregory Higley
For the comment example, a null value should signify "We never asked the user for a comment", while a "" value should signify "We presented the user a form with an optional comment box, and they submitted the form without putting anything in it". In most cases I can envision there being a "comments" column, the latter situation is the only likely one.
Brian Schroth
+1  A: 

In my experience, it is a problem when Null and Not Null don't match up to the required field /not required field.

It is in the realm of possibility that those really are all optional fields. If you find in the business tier or the UI tier that those fields are required, then I think this means the data model has drifted away from the business object model and is a sign of overly conservative DB change policies, or oversight.

If you run a sample data generator on your data, and then try to load the data that is valid according to SQL, you would find out right away if the rules match up.

MatthewMartin
A: 

That seems like a lot, it probably means you should at least investigate. Note that if this is mature product with a lot of data, convincing anyone to change the structure may be difficult. The earlier in the design phase you catch something like this the easier it is to fix up all the related code to adjust for the change.

Whether it is bad that they used the nulls would depend on whether the columns allowing nulls look as if they should be related tables (home phone, cell phone, business phone etc which should be in aspearate phone table) or if they look like things that might not be applicable to all records (possibly could bea related table with a one-to-one relationship)or might not be known at the time of data entry (probably ok). I would also check to see if they in fact alwAys do have a value (then you might be able to change to not null if the information is genuinely required by the busniess logic). If you have a few records with null

HLGEM
A: 

In my experience, a lot nullable field in a large database like you have is very normal. Considering it perhaps is used by a lot of applications written by different people. Making columns nullable is annoying but it is perhaps the best way to keep the application robust.

Victor
It is depressingly common; it is not good, and generally does not make the application robust.
Jonathan Leffler
A: 

One of the many ways to map inheritance (e.g. c# objects) to a database is to create a table for the class at the top of the hierarchy, then add the columns for all the other classes. The columns have to be nullable for when an object of a different subclass is stored in the database. This is called Single-table inheritance mapping (or Map Hierarchy To A Single Table) and is a standard design pattern.

A side effect of Single-table inheritance mapping is that most columns are nullable.


Also in Oracle an empty string (0 length) is considered to be null, therefore in some companies all strings columns are made nullable even on SqlServer. (just because the first customer wants the software on SqlServer does not mean the 2nd customer does not have a Oracle DBA that will not let SqlServer onto there network)

Ian Ringrose
Yet, when it gets to the stage that most columns are null, I think it's time to consider mapping to multiple tables. That will make it possible to enforce some constraints on the derived tables.
John Saunders
But changing the ORM system the application uses can be a big risk. In the end, the database is there to serve the application not the other way round. (I am a C# programmer not a DBA after all)
Ian Ringrose
Who said anything about changing the ORM system? Just change how the ORM maps to the underlying database. Besides, this can allow additional constraints to be enforced, improving the quality of the system as a whole.
John Saunders
Assuming that the ORM system that was chosen 5 years ago lets you
Ian Ringrose
And as a Developer that is very antiDBA I would say constraints on those tables are inherently bad in the first place. I would fully agree with Ian, the database is meant to serve the application to often I've seen the database dictating the application. That IS wrong.
Chris Marisic
+5  A: 

I'm of the Extreme NO camp: I avoid NULLs all the time. Putting aside fundamental considerations about what they actually mean (because talk to different people, you'll get different answers such as "no value", "unknown value", "missing", "my ginger cat called Null"), the worst problem NULLs cause is that they often ruin your queries in mysterious ways.

I've lost count of the number of times I've had to debug someone's query (okay, maybe 9) and traced the problem to a join against a NULL. If your code needs ISNULL to repair joins then the chances are you've also lost index applicability and performance with it.

If you do have to store a "missing/unknown/null/cat" value (and it's something I prefer to avoid), it is better to be explicit about it.

Those skilled at NULLs may disagree. NULL use tends to split SQL crowds down the middle.

In my experience, heavy NULL use has been positively correlated with database abuse but I wouldn't carve this into stone tablets as some Law of Nature. My experience is just my experience.

EDIT: Additional thought. It is possible that those who are anti-null racists like myself are more excited by normalization than those who are pro-NULL. I don't think rabid normalizers would be too happy with ragged edges on their tables that can take NULLs. Lots of nulls may indicate that the the database developers are not into heavy normalisation. So rather than NULL suggesting code is "bad" it may alternatively suggest the philosophical position of the developers on normalisation. Maybe this is reaching. Just a thought.

Joel Goodwin
How do you feel about the integer value zero, which results in division-by-zero errors when used improperly? Does this mean we should disallow the use of zero?
Bill Karwin
Also, your cat example is spurious. The string 'Null' is not the same as a SQL NULL. But it does make me wonder how you'd make a poster searching for that cat if it went missing. ;-)
Bill Karwin
If it's Schrodinger's cat, then it can be alive or dead, so makes it relevant in a sort of "what is it?" way =) Div by zero errors are consistent, in your face and pretty obvious; it's a fundamental arithmetic problem that's you have to live with. NULL tends to be stealthy like a ninja, you're not sure you've got a NULL problem until you've been had - plus NULL join behaviour can be inconsistent across platforms. I would argue that it isn't a fundamental concept with a definitive meaning and standard set of behaviours, unlike div by zero. (And 'Null' != SQL NULL has caught many a dev out)
Joel Goodwin
Re: the divide by zero comment, that's an application issue, not a database one. I personally am a big fan of 0 value for an int column, rather than null. Dividing by null can be its own issue.
Chris
My point is that zero and NULL both deserves some special handling. ANSI SQL is clear on the standard semantics of NULL (Oracle's VARCHAR2 behavior notwithstanding). And anyone who can't tell the difference between NULL and 'Null' is not paying attention in class!
Bill Karwin
I would not simply equate the two, these are completely different levels of special. Zero: divide by zero and query fails. NULL: It is not COUNTed as a value; put it amongst some operations and it acts as a super-zero, wiping out the entire operation with NULL; it cannot be used in equality, nor inequality - "val <> 3" will not return the val=NULL row) unless you are explicit with IS NULL. This leads to the surprise that SELECT * FROM data WHERE (val=3) AND (val<>3) does not return all rows. More on wikipedia, http://en.wikipedia.org/wiki/Null_(SQL).
Joel Goodwin
I see your point, but I view NULL as a very similar case to zero, with respect to the fact that you have to check for the presence of NULL when writing certain types of expressions. I do understand the semantics of NULL, but I maintain that it's a useful and valid part of the language when used correctly. Just because some developers don't understand how it works doesn't mean we should make blanket rules against using it.
Bill Karwin
I totally agree. As I pointed out - those skilled at using NULLs would probably disagree with my stance. Putting aside arguments about relational fundamentals, I would summarise my position as this: that the NULL is so exceptional in operation that it's more likely to cause harm than normalising them out of the solution. In over a decade of SQL work, I have met many who used NULLs, but few who were aware of it's subtleties [maybe it's a symptom of my sector - but that's another topic]. Bill, I count you in as being aware =)
Joel Goodwin
Thanks. Yeah, I agree 100% that a lot of developers don't understand how to use NULL correctly. I'm writing a book on "SQL Antipatterns" and I have a chapter about misuse of NULL -- either treating NULL as an ordinary value, or treating an ordinary value as NULL.
Bill Karwin
+3  A: 

In short, I would say yes, this is probably a code smell.

Whether a column is nullable or not is very important and should be determined carefully. The question should be assessed for every column. I am not a believer in a single "best practices" default for NULL. The "best practice" for me is to address the nullability thoroughly during the design and/or refactoring of the table.

To start with, none of your primary key columns are going to be nullable. Then, I strongly lean towards NOT NULL for anything which is a foreign key.

Some other things I consider:

Criteria where NULL should be strongly avoided: money columns - is there really a possibility that this amount will be unknown?

Criteria where NULL can be justified most frequently: datetime columns - there are no reserved dates, so NULL is effectively your best option

Other data types: char/varchar columns - for codes/identifiers - NOT NULL almost exclusively int columns - mostly NOT NULL unless it's something like "number of children" where you want to distinguish an unknown response.

Cade Roux
A: 

To throw the opposite opinion out there. Every single field in a database should nullable. There is nothing more frustrating than working with a database that on every single insert throws an exception about required this or required that. Nothing should be required.

There is one exception to that, keys. Obviously all primary and foreign keys should be enforced to exist.

It should be the application's job to validate data and the database to simply store and retrieve what you give it. Having it process validation logic even as simple as null or not null makes a project way more complex to maintain for having different rules spread over everything.

Chris Marisic