views:

289

answers:

10

I have a table A that has a startDate and an end dateDate as 2 datetime columns besides some more other columns. I have another table B that has one datetime column call it dates column. This is in SQL Server 2005.

Here the question: How to best set up the indexes etc to get the following:

select ....
 from A , B
where A.startDate >= B.dates
  and A.endDate < B.dates

Both tables have several thousand records.

A: 

If you need to optimize try to run this query in the Query Analyzer.

bassfriend
A: 

Hi, every version of sql server 2000, 2005, 2008 has a program called DataBase tuning advisor when you run some query it tells you what indexes you need to add to get the query faster Best Regards, Iordan

IordanTanev
A: 

You need 3 indexes A.startDate, B.dates and A.endDate, may be index (A.endDate+A.startDate) is also good. I have no details on another columns and purposes for these tables, but review possibility to use clustered index.

In anyway use "Execution plan" option to make decision between all these variants, because my suggestion is too general

Dewfy
A: 

The following script will list possible missing indices (you may filter the statement by t.name).

SELECT     t.name AS 'affected_table',
           'Create NonClustered Index IX_' + t.name + '_missing_' + CAST(ddmid.index_handle AS VARCHAR(10)) + ' On ' + ddmid.STATEMENT + ' (' + ISNULL(ddmid.equality_columns, '') +
           CASE
                      WHEN ddmid.equality_columns IS NOT NULL
                             AND ddmid.inequality_columns IS NOT NULL
                      THEN ','
                      ELSE ''
           END + ISNULL(ddmid.inequality_columns, '') + ')' + ISNULL(' Include (' + ddmid.included_columns + ');', ';') AS sql_statement,
           ddmigs.user_seeks,
           ddmigs.user_scans,
           CAST((ddmigs.user_seeks + ddmigs.user_scans) * ddmigs.avg_user_impact AS INT) AS 'est_impact',
           ddmigs.last_user_seek
FROM       sys.dm_db_missing_index_groups      AS ddmig
INNER JOIN sys.dm_db_missing_index_group_stats AS ddmigs
ON         ddmigs.group_handle = ddmig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS ddmid
ON         ddmig.index_handle = ddmid.index_handle
INNER JOIN sys.tables AS t
ON         ddmid.OBJECT_ID = t.OBJECT_ID
WHERE      ddmid.database_id = DB_ID()
       AND CAST((ddmigs.user_seeks + ddmigs.user_scans) * ddmigs.avg_user_impact AS INT) > 100
ORDER BY   CAST((ddmigs.user_seeks + ddmigs.user_scans) * ddmigs.avg_user_impact AS INT) DESC;
Scoregraphic
A: 

I would just add a clustered index on B.dates. If you add indexes on startDate and endDate it won't buy anything because you'll get index scans on A anyway. The clustered index on B gets you an index seek in B at least. A Table Scan and Index Scan are the same thing so there's no point in adding indexes to get the word Table Scan out of your execution plan :)

I'd mock it up a few ways or see if you can redo your query to not require a table scan on A which I'm guessing isn't really possible.

Jon
"A Table Scan and Index Scan are the same thing"? I don't think so, unless you mean when the index has all the columns in the table. I highly doubt that his tables only have the mentioned columns in them.
ongle
Yes they are the same thing. An index scan retrieves every row from a table while a seek does not. A table scan (or index scan) where the is no other clustered index on the table gives you the worst performance.
Jon
An index scan retrieves every "row" from the index, not the table. An index scan is better than a table scan simply because an index normally has fewer columns in it than the table, resulting in more "rows" per read. But we agree that scans are bad and to be avoided.
ongle
The OP said that the table in question would only have a few thousands rows in it so it's doubtful that anyone would notice any difference in the query speed because of the implications that you're discussing here. I mocked up a few examples and didn't see any difference on the data size the OP was discussing when I tried nonclustered and clustered indexes on the various columns in tableA referenced.
Jon
As we don't know what other columns the table has and what other querie are done, we can not tell if puting a clustered index on would be helpful
Ian Ringrose
@Ian: Good point. It's also possible that there's already one on that table but if the table just has dates then it should be helpful.
Jon
A: 

A little more information is needed. How many other columns are in the tables? Are these existing tables with lots of queries already going against them, or all new tables? What kind of performance issue are you seeing that lead you to ask the question?

I'm assuming that all three columns are NOT NULL (not just for the query syntax, but for the index usefulness).

I would start with a compound index on A.startDate + A.endDate, and another index on B.dates (but this is likely not needed). Unless these dates are the primary purpose of the tables, I would avoid creating clustered indices on these columns. This is doubly true if these tables are existing tables with other queries running against them. Previous queries may be written expecting the existing clustered indices.

ongle
+2  A: 

I have worked at two companies (both doing time and attendance management systems) that have lots of times with startDate and endDate columns. In my experience there is no good indexes that always works with date ranges.

Try indexes like (startDate, -endDate) and (-endDate, startDate) to see if they help, a lot depends on what the data in the table is like. E.g if you tend to have lots of old rows with an endDate before the dates you are looking for, forcing Sql to use an index based on (endDate, startDate) may help.

Also try using an index that covers all columns that are in your “where” statement, so sql does not need to read the main table until it has worked out what rows to return.

You may have to use index hints, as it is unlikely that the query processor knows enough about the data to make a good choose of indexes – this is one of very few cases when I have had to consider index hints.

Expanding the data, so you have a table that contains (date, rowed) with a row for each date within the date range may be needed. However keeping the "index" table updated is a pain.

If you know that some of your date ranges don't overlap, have a look at Using CROSS APPLY to optimize joins on BETWEEN conditions (E.g an employee's sickness records may not be allowed to overlap)

At the end of the day if you only have several thousand records, a full table scan is not to bad.

Quassnoi subjects using SPATIAL indexes, I have no experience with ”abusing” spatial indexes in this way, but I think it is worth trying. However be very careful if you will have to every support multiply database vendors, as spatial index are rather new. Also you may still need the date columns for reporting tools etc.

(Sooner or later will need to be able to find all rows that overlaps a date range, then it become even harder to get indexes that returns good results.)

Ian Ringrose
+1  A: 

a useful link: Using CROSS APPLY to optimize joins on BETWEEN conditions

AlexKuznetsov
+1 as this is in interesting ideal that *may* help the OP
Ian Ringrose
A: 

I'd go with this

CREATE CLUSTERED INDEX IX_DateRange ON dbo.A
 (
 StartDate,
 EndDate DESC
 ) 
GO
RBarryYoung
+3  A: 

Update:

See this article in my blog for efficient indexing strategy for your query using computed columns:

The main idea is that we just compute rounded length and startDate for you ranges and then search for them using equality conditions (which are good for B-Tree indexes)


In MySQL and in SQL Server 2008 you could use SPATIAL indexes (R-Tree).

They are particularly good for the conditions like "select all records with a given point inside the record's range", which is just your case.

You store the start_date and end_date as the beginning and the end of a LineString (converting them to UNIX timestamps of another numeric value), index them with a SPATIAL index and search for all such LineStrings whose minimum bounding box (MBR) contains the date value in question, using MBRContains.

See this entry in my blog on how to do this in MySQL:

and a brief performance overview for SQL Server:

Same solution can be applied for searching a given IP against network ranges stored in the database.

This task, along with you query, is another often used example of such a condition.

Plain B-Tree indexes are not good if the ranges can overlap.

If they cannot (and you know it), you can use the brilliant solution proposed by @AlexKuznetsov

Also note that this query performance totally depends on your data distribution.

If you have lots of records in B and few records in A, you could just build an index on B.dates and let the TS/CIS on A go.

This query will always read all rows from A and will use Index Seek on B.dates in a nested loop.

If your data are distributed other way round, i. e. you have lots of rows in A but few in B, and the ranges are generally short, then you could redesign your tables a little:

A

start_date interval_length

, create a composite index on A (interval_length, start_date)

and use this query:

SELECT  *
FROM    (
        SELECT  DISTINCT interval_length
        FROM    a
        ) ai
CROSS JOIN
        b
JOIN    a
ON      a.interval_length = ai.interval_length
        AND a.start_date BETWEEN b.date - ai.interval_length AND b.date
Quassnoi
This is the first good reason I have seen for making software depend on SQL Server 2008 (rather then 2005)
Ian Ringrose
I don’t know how spatial index combine with other types of index, eg “where department = 123 and SickLeave OVERLAPS WorldCub”
Ian Ringrose
`@Ian`: in `SQL Server`, the indexes will be used apart and then merged using an appropriate merge method (most probably `HASH MATCH`)
Quassnoi
Sorry, what I meant was, I could create an index on (DepartmentId, SicknesStartDate) then if I did a search for all employee in a department that was sick at a given time the index may be used. However can I created “compound” (for want of a better word) indexes that include R-Trees?
Ian Ringrose
`@Ian`: you cannot. `B-Tree` indexes rely on sorting, and sorting on `(col1, col2, com3)` implies sorting on `(col1, col2)` and `col1`, so with `B-Tree`, you get `3` indexes for the price of one. This is not the case for `R-Tree`.
Quassnoi