views:

219

answers:

5

Quick question for the DBA's out there:

Say I have 2 columns on my table, IsDeleted (bit) and DeletedDate (datetime). The table contains approx 10,000,000 rows.

IsDeleted is a computed column that checks to see if DeletedDate is NULL; and it returns 1 if it is not, and 0 if it is.

Querying this table will mainly be done on the IsDeleted column.

Could anybody give me some suggestions on where I should apply my index?

Applying it to the IsDeleted field brings in a crop of problems due to SET QUOTED IDENTIFIER being off, which isn't necessarily a deal breaker, but would cause some additional work.

Would I see any benefit applying it to DeletedDate, even though I'm not querying that field directly? Should I just bite the bullet and add it to IsDeleted? Is the performance difference between the two negligible?

Thanks again; and if you'd like any clarifications; leave me a comment and I'll update my post.

A: 

I would not imagine you would see any benefit to adding it to DeletedDate. However, if you're not sure, it should be pretty easy to test the performance both ways.

Phil Sandler
+2  A: 

It doesn't make sense to put an index on a bit column because it is not selective enough. When executing a query, SQL Server determines the most appropriate indexes to use. If your index is not selective enough, it will be ignored or it may decide to do an index scan instead of an index seek. Either way, it won't really help all that much.

Putting the index on the DeletedDate could possibly help with some queries, but filtering on NULL vs. "any value" will probably not be that much help either because of the selectivity.

I encourage you to read this: Seek Vs. Scan

G Mastros
+1  A: 

Placing an index on an attribute whose values are limited to a very small domain (obviously two-valued is the smallest possible) does not make sense except for special edge cases, (such as when the rows are distributed 90%-10% between the 2 values)

This is because any use of the index to find one of the values (assuming the rows are evenly distributed approximately 50-50) will return about half the total rows in the table. If the balanced-tree (B-Tree) index you would create is three or four levels deep, that means 3 or 4 IO operations per row retrieved, which would be more than the number of rows in the table.

Charles Bretana
+1  A: 

You can't put an index on IsDeleted if the computation is based on the current datetime, because the result of the computed column is non-deterministic. It's time based, and potentially has a different outcome on every invocation. See this msdn article for details:

For example, if the table has integer columns a and b, the computed column a+b may be indexed, but computed column a+DATEPART(dd, GETDATE()) cannot be indexed because the value may change in subsequent invocations.

If the date compare becomes too expensive, you'll have to schedule an update statement to run every 'x' time to set the IsDeleted value for 'expired' dates:

UPDATE MyTable SET IsDeleted=1 WHERE IsDeleted=0 AND DeletedDate < getutcdate()

Edit: I misread the question initially, when the computation is NULL vs non-NULL it will be deterministic. With a deterministic result, the PERSISTED keyword can be used to store the result of the null check:

IsDeleted AS DeletedDate IS NOT NULL PERSISTED

This avoids running datetime null checks all the time. It stores the result in the table, until you update the DeletedDate column. You need to test wether this actually pays off though, I don't think the DeletedDate NULL check will be very expensive.

Indexing either property probably doesn't make much sense because you basically want to separate 2 groups: deleted and non-deleted.

Sander Rijken
Is this true? Checking whether the field is null or not doesn't seem like it would logically be non-deterministic.
Phil Sandler
oh I misunderstood that, I thought you were comparing DeletedDate with the current date, Wonder wether to edit, or pull the answer and re-answer
Sander Rijken
A: 

Say the distribution is: 98% IsDeleted = 0 2% IsDeleted = 1

would SQL Server be clever enough to store only the info for the IsDeleted records in an index ?

Klaus