views:

335

answers:

8

I would like to know about how NULL values affect query performance in SQL Server 2005.

I have a table similar to this (simplified):

ID | ImportantData | QuickPickOrder
--------------------------
1  | 'Some Text'   | NULL
2  | 'Other Text'  | 3
3  | 'abcdefg'     | NULL
4  | 'whatever'    | 4
5  | 'it is'       | 2
6  | 'technically' | NULL
7  | 'a varchar'   | NULL
8  | 'of course'   | 1
9  | 'but that'    | NULL
10 | 'is not'      | NULL
11 | 'important'   | 5

And I'm doing a query on it like this:

SELECT   *
FROM     MyTable
WHERE    QuickPickOrder IS NOT NULL
ORDER BY QuickPickOrder

So the QuickPickOrder is basically a column used to single out some commonly chosen items from a larger list. It also provides the order in which they will appear to the user. NULL values mean that it doesn't show up in the quick pick list.

I've always been told that NULL values in a database are somehow evil, at least from a normalization perspective, but is it an acceptable way to filter out unwanted rows in a WHERE constraint?

Would it be better to use specific number value, like -1 or 0, to indicate items that aren't wanted? Are there other alternatives?

EDIT: The example does not accuratly represent the ratio of real values to NULLs. An better example might show at least 10 NULLs for every non-NULL. The table size might be 100 to 200 rows. It is a reference table so updates are rare.

A: 

NULL looks fine to me for this purpose. Performance is likely to be basically the same as with a non-null column and constant value, or maybe even better for filtering out all NULLs.

Lucero
+2  A: 

They do not have a negative performance hit on the database. Remember, NULL is more of a state than a value. Checking for NOT NULL vs setting that value to a -1 makes no difference other than the -1 is probably breaking your data integrity, imo.

northpole
+3  A: 

SQL Server indexes NULL values, so this will most probably just use the Index Seek over an index on QuickPickOrder, both for filtering and for ordering.

Quassnoi
if the table's column is 50% null (similar to the given sample data), I'd think it may tend to do a index scan
KM
@KM: why doing an Index Scan here? It may do a Table Scan / Clustered Index Scan to avoid RID Lookups / Key Lookups, but we have a range condition here, so an Index Seek will always be superior.
Quassnoi
won't it ignore the index and scan since most values are the same (null)?
KM
@KM: it may do it, but it won't be an INDEX Scan, it will be a TABLE Scan or CLUSTERED Index Scan (which is same as Table Scan for clustered tables). Index Scan implies traversing whole index on QuickPickOrder, filtering out the wrong values and then joining with the table using Key Lookup / RID Lookup to fetch the * requested by SELECT clause. Index Seek does the same but starts from the first non-NULL value, so the NULL's are just left over.
Quassnoi
+1  A: 

The alternative is to normalize QuickPickOrder into a table with a foreign key, and then perform an inner join to filter the nulls out (or a left join with a where clause to filter the non-nulls out).

mgroves
A: 

NULL looks good to me as well. SQL Server has many kinds of indices to choose from. I forget which ones do this, but some only index values in a given range. If you had that kind of index on the column being tested, the NULL valued records would not be in the index, and the index scan would be fast.

Paul Chernoch
+2  A: 

Another alternative would be two tables:

MyTable:

ID | ImportantData
------------------
1  | 'Some Text'
2  | 'Other Text'
3  | 'abcdefg'
4  | 'whatever'
5  | 'it is'
6  | 'technically'
7  | 'a varchar'
8  | 'of course'
9  | 'but that'
10 | 'is not'
11 | 'important'

QuickPicks:

MyTableID   | QuickPickOrder
--------------------------
2           | 3
4           | 4
5           | 2
8           | 1
11          | 5

SELECT   MyTable.*
FROM     MyTable JOIN QuickPicks ON QuickPickOrder.MyTableID = MyTable.ID
ORDER BY QuickPickOrder

This would allow updating QuickPickOrder without locking anything in MyTable or logging a full row transaction for that table. So depending how big MyTable is, and how often you are updating QuickPickOrder, there may be a scalability advantage.

Also, having a separate table will allow you to add a unique index on QuickPickOrder to ensure no duplication, and could be more easily scaled later to allow different kinds of QuickPicks, having them specific to certain contexts or users, etc.

richardtallent
A: 

Having a lot of NULLs in a column which has an index on it (or starting with it) is generally beneficial to this kind of query.

NULL values are not entered into the index, which means that inserting / updating rows with NULL in there doesn't take the performance hit of having to update another secondary index. If, say, only 0.001% of your rows have a non-null value in that column, the IS NOT NULL query becomes pretty efficient as it just scans a relatively small index.

Of course all of this is relative, if your table is tiny anyway, it makes no appreciable difference.

MarkR
A: 

SQL Server's performance can be affected by using NULLS in your database. There are several reasons for this.

First, NULLS that appear in fixed length columns (CHAR) take up the entire size of the column. So if you have a column that is 25 characters wide, and a NULL is stored in it, then SQL Server must store 25 characters to represent the NULL value. This added space increases the size of your database, which in turn means that it takes more I/O overhead to find the data you are looking for. Of course, one way around this is to use variable length fields instead. When NULLs are added to a variable length column, space is not unnecessarily wasted as it is with fixed length columns.

Second, use of the IS NULL clause in your WHERE clause means that an index cannot be used for the query, and a table scan will be performed. This can greatly reduce performance.

Third, the use of NULLS can lead to convoluted Transact-SQL code, which can mean code that doesn't run efficiently or that is buggy.

Ideally, NULLs should be avoided in your SQL Server databases.

Instead of using NULLs, use a coding scheme similar to this in your databases:

  • NA: Not applicable
  • NYN: Not yet known
  • TUN: Truly unknown

Such a scheme provides the benefits of using NULLs, but without the drawbacks.

Prashant