ansaurus

Question

SQL Indexing - Computed Column vs Field Used by Computed Column

Answer 1

A:

I would not imagine you would see any benefit to adding it to DeletedDate. However, if you're not sure, it should be pretty easy to test the performance both ways.

Phil Sandler 2009-12-16 21:02:32

Answer 2

+2 A:

It doesn't make sense to put an index on a bit column because it is not selective enough. When executing a query, SQL Server determines the most appropriate indexes to use. If your index is not selective enough, it will be ignored or it may decide to do an index scan instead of an index seek. Either way, it won't really help all that much.

Putting the index on the DeletedDate could possibly help with some queries, but filtering on NULL vs. "any value" will probably not be that much help either because of the selectivity.

I encourage you to read this: Seek Vs. Scan

G Mastros 2009-12-16 21:10:57

Answer 3

+1 A:

Placing an index on an attribute whose values are limited to a very small domain (obviously two-valued is the smallest possible) does not make sense except for special edge cases, (such as when the rows are distributed 90%-10% between the 2 values)

This is because any use of the index to find one of the values (assuming the rows are evenly distributed approximately 50-50) will return about half the total rows in the table. If the balanced-tree (B-Tree) index you would create is three or four levels deep, that means 3 or 4 IO operations per row retrieved, which would be more than the number of rows in the table.

Charles Bretana 2009-12-16 21:16:58

Answer 4

+1 A:

You can't put an index on IsDeleted if the computation is based on the current datetime, because the result of the computed column is non-deterministic. It's time based, and potentially has a different outcome on every invocation. See this msdn article for details:

For example, if the table has integer columns a and b, the computed column a+b may be indexed, but computed column a+DATEPART(dd, GETDATE()) cannot be indexed because the value may change in subsequent invocations.

If the date compare becomes too expensive, you'll have to schedule an update statement to run every 'x' time to set the IsDeleted value for 'expired' dates:

UPDATE MyTable SET IsDeleted=1 WHERE IsDeleted=0 AND DeletedDate < getutcdate()

Edit: I misread the question initially, when the computation is NULL vs non-NULL it will be deterministic. With a deterministic result, the PERSISTED keyword can be used to store the result of the null check:

IsDeleted AS DeletedDate IS NOT NULL PERSISTED

This avoids running datetime null checks all the time. It stores the result in the table, until you update the DeletedDate column. You need to test wether this actually pays off though, I don't think the DeletedDate NULL check will be very expensive.

Indexing either property probably doesn't make much sense because you basically want to separate 2 groups: deleted and non-deleted.

Sander Rijken 2009-12-16 21:30:39

Is this true? Checking whether the field is null or not doesn't seem like it would logically be non-deterministic.

Phil Sandler 2009-12-16 21:38:54

oh I misunderstood that, I thought you were comparing DeletedDate with the current date, Wonder wether to edit, or pull the answer and re-answer

Sander Rijken 2009-12-16 21:40:52

Answer 5

A:

Say the distribution is: 98% IsDeleted = 0 2% IsDeleted = 1

would SQL Server be clever enough to store only the info for the IsDeleted records in an index ?

Klaus 2010-06-21 10:30:40

ansaurus

tags:

views:

answers:

SQL Indexing - Computed Column vs Field Used by Computed Column

related questions