tags:

views:

398

answers:

9

I have a table with columns

Index, Date

where an Index may have multiple Dates, and my goal is the following: select a list that looks like

Index, MinDate, MaxDate

where each Index is listed only once, and MinDate (MaxDate) represents the earliest (latest) date present in the entire table for that index. That's easy enough, but then let's constrain this list to appear only for Indexes that are present in a given range of dates.

So far, I have the following:

SELECT 
    Index,
    MIN([Date]),
    MAX([Date])
FROM myTable
WHERE
    Index IN
    (SELECT Index From myTable WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000')
GROUP BY Index
ORDER BY Index ASC

This is excruciatingly slow. Any way to speed this up? [I am running SQL Server 2000.]

Thanks!

Edited: For clarity.

A: 

@Vinko

I don't believe there is one, no.

Jake
A: 

You don't need the sub-select in the where clause. Also, you could add indexes to the date column. How many rows in the table?

SELECT
    [INDEX],
    MIN ( [Date] ),
    MAX ( [Date] )
FROM
    myTable
WHERE 
    [Date] Between '1/1/2000' And '12/31/2000'
GROUP BY
    [Index]
ORDER BY
    [INDEX] ASC
John
Sorry, that doesn't work because this gives the MinDate and MaxDate as extremes for the given range, but we need them for the _entire table_.
Jake
You need the max and min for the entire table but only the items that have a record in the year 2000?
Eduardo Molteni
Yes. =) .
Jake
A: 

Putting a clustered index on the date column would greatly speed up this query, but obviously it may slow down other currently fast running queries on the table.

Chris Aitchison
+2  A: 

I am not an SQL Server expert, but if you can do sub-selects like so, this is potentially faster.

SELECT Index,
  (SELECT MIN([Date] FROM myTable WHERE Index = m.Index),
  (SELECT MAX([Date] FROM myTable WHERE Index = m.Index)
From myTable m 
WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000'
Greg Ogle
So far, this is the fastest without creating a new table.
Jake
This is probably how I would have done it.
Valerion
+4  A: 

I would recommend a derived table approach. Like this:

SELECT 
     myTable.Index,
     MIN(myTable.[Date]),
     MAX(myTable.[Date])
FROM myTable
     Inner Join (
       SELECT Index 
       From myTable 
       WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000') As AliasName
       On myTable.Index = AliasName.Index
GROUP BY myTable.Index
ORDER BY myTable.Index ASC

EDIT: Upon further review, there is another way you can create this query. The following query may be faster, slower, or execute in the same amount of time. This, of course, depends on how the table is indexed.

Select [Index],
       Min([Date]),
       Max([Date])
From   myTable
Group By [Index]
Having Sum(Case When [Date] Between '1/1/2000' And '12/31/2000' Then 1 Else 0 End) > 0

Under the best circumstances, this query will cause an index scan (not a seek) to filter out rows you don't want to display. I encourage you to run both queries and pick this oen the executes the fastest.

G Mastros
First query should perform as the original. Second query might be 1 table scan (very very good). Estimated Execution Plan will confirm.
David B
+1  A: 

Jake,

I think you may need to take a different POV at this problem.

The grouped selected of **Index, Min(Date), Max(Date)** isn't going to change drastically over the course of a day, in comparison with the range of data its covers (presumably many years)

So one option would be to create a summary table based on the data in the main table... e.g.

   SELECT 
       Index, 
       Min(Date) as MinDate, 
       Max(Date) as MaxDate
   INTO 
      MySummaryTable
   FROM 
      MyOriginalTable
   GROUP BY
      Index

This table could be dropped and recreated on a semi-regular (daily) base via a sql job. Equally I'd stick an index on the id column of it.

Then when you need to run you're daily query,

SELECT 
   summary.Index,
   summary.MinDate,
   summary.MaxDate
FROM
   MyOriginalTable mot
   INNER JOIN MySummaryTable summary
      ON mot.Index = summary.Index  --THIS IS WHERE YOUR CLUSTERED INDEX WILL PAY OFF
WHERE
   mot.Date BETWEEN '2000-01-01' AND '2000-12-31' --THIS IS WHERE A SECOND NC INDEX WILL PAY OFF
Eoin Campbell
Very helpful. Thanks!
Jake
A: 

Your explanation isn't very clear:

where each Index is listed only once, and MinDate (MaxDate) represents the earliest (latest) date present in the entire table.

If that is the case, you should either return two resultsets or store the answer such as:

DECLARE @MaxDate datetime, @MinDate datetime
SELECT
    @MinDate = MIN([Date]),
    @MaxDate = MAX([Date])
FROM myTable
--
SELECT
    [Index],
    @MinDate,
    @MaxDate
FROM myTable
WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000'

If you want to know the minimum/maximum for the entire table as well as for the [Index], then try the following in combination with the previous code:

SELECT
    [Index],
    MIN([Date]) AS IndexMinDate,
    MAX([Date]) AS IndexMaxDate,
    @MinDate AS TableMinDate,
    @MaxDate AS TableMaxDate
FROM myTable
WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000'
GROUP BY [Index]
ORDER BY [Index] ASC

Also look into indexing the columns if possible and the query plan. Good luck.

Ryan
A: 

An EXISTS operator might be faster than the subquery:

SELECT
     t1.Index,
     MIN(t1.[Date]),
     MAX(t1.[Date])
FROM
     myTable t1
WHERE
     EXISTS (SELECT * FROM myTable t2 WHERE t2.Index = t1.Index AND t2.[Date] >= '1/1/2000' AND t2.[Date] < '1/1/2001')
 GROUP BY
      t1.Index

It would depend on table size and indexing I suppose. I like G Mastros HAVING clause solution too.

Another important note... if your date is actually a DATETIME and there is a time component in any of your dates (either now or in the future) you could potentially miss some results if an index had a date of 12/31/2000 with any sort of time besides midnight. Just something to keep in mind. You could alternatively use YEAR([Date]) = 2000 (assuming MS SQL Server here). I don't know if the DB would be smart enough to use an index on the date column if you did that though.

EDIT: Added GROUP BY and changed date logic thanks to the comment

Tom H.
Good point on the date logic. Yes - YEAR(Date) will foil index usage. Instead use '2000-01-01' <= Date AND Date < '2001-01-01' (exclusive end date logic). Also, Query seems to be missing a group by clause.
David B
+1  A: 

This should do it in two table scans.

SELECT
     Index,
    MIN([Date]),
    MAX([Date])
FROM myTable
WHERE
    Index IN
    (SELECT Index From myTable WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000')
GROUP BY Index
ORDER BY Index ASC
OPTION (MERGE JOIN)


Here's another query. This query gets different results than was originally asked for. This will get all Indexes that have date ranges that overlap the period of interest (even if there is not any actual activity in the period of interest for that index).

SELECT
    Index,
    MIN([Date]),
    MAX([Date])
FROM myTable
GROUP BY Index
HAVING MIN([Date]) < '2001-01-01' AND MAX([Date]) >= '2000-01-01')
ORDER BY Index ASC

So this will return, even if 3 has no data in the 2000 year.

3, 1998-01-01, 2005-01-01

David B