views:

22

answers:

2

Hi,

I have 270 million records in a table. Currently I have a non clustered index in it on a date column. 99% of time I use rows with date > 1/1/2008..that means 140 million out of it. I am using SQL server 2008.In this situation will it be beneficial to use filtered index other than normal non-clustered index ?

Also if I use "date" datatype instead of "datetime" how beneficial it is ?

Thanks in advance !

+1  A: 

Yes, it's possible that the filtered index will be helpful. If you have a common filter, such as "WHERE MyColumn IS NOT NULL" to get the 140 million rows, then that may be the way to create the index. The index will be built with keys that match the criteria, which makes for a much smaller set of index data.

The date key (3-byte storage) may also be better than datetime (8-byte storage) when you are dealing with large amounts of data. But, it only helps if you don't need the precision (time) you get from the datetime data type.

bobs
this column is not-null. How beneficial it is ..? any idea ? I mean how much time i can gain. I need quantitative analysis.
Anish
You can use any valid criteria to define a filtered index. For example, you can create the filtered index based on "MyDateColumn > '1/1/2009'" and you would only index data with date greater than that date. So, the nullability is not as important when determining whether to use the filtered index or not.
bobs
It's important that you create indexes that are useful for the queries you will be running. Each of these points you mention are important and you'll have to evaluate each index option with those queries.
bobs
Do u know any other feature in 2008 which can help in this situation ?
Anish
You may want to look into partitioning of table data. It may help with performance, but nothing helps more than proper indexing.
bobs
+1  A: 

Yes, the filtered non clustered index will be used for:

  • queries than scan very very few records, eg. have WHERE date ='20101016' (filter out a single day, few records from the 270M).
  • queries than scan larger date ranges, but touch only the date field: SELECT COUNT(date) FROM ... WHERE date BETWEEN '20080101' AND '20090101'

And that's about it. Any more complex query will not use the non-clustered index, filtered or not-filtered, because it will hit the index tipping point.

So in conclusion, for general queries on this table that have a WHERE date > '200080101' clause your proposed filtered non-clustered index will help... nothing. Furthermore, even if you would move the date as the clustered index leftmost key (which is the typical organization of time range queries time series, like your table seems to be, and you should consider it yourself), filtering out 'only' 140M out of 270M is hardly an optimization.

Proper indexing is a complex topic. I urge you to read everything in this MSDN link, including all the sub-chapters: Designing Indexes. This will give you the bare minimal knowledge to understand some more complicated articles and to be able to discern right from baloney on the plethora of misinformation available out there.

Remus Rusanu