HI All
I was hoping someone could help me improve a query I have to run periodically. At the moment it takes more than 40 minutes to execute. It uses the full allocated memory during this time, but CPU usage mostly meanders at 2% - 5%, every now and then jumping to 40% for a few seconds.
I have this table (simplified example):
CREATE TABLE [dbo].[dataTable]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[dteEffectiveDate] [date] NULL,
[dtePrevious] [date] NULL,
[dteNext] [date] NULL,
[Age] [int] NULL,
[Count] [int] NULL
) ON [PRIMARY]
GO
Here are some input values:
INSERT INTO [YourDB].[dbo].[dataTable]
([dteEffectiveDate]
,[dtePrevious]
,[dteNext]
,[Age]
,[Count])
VALUES
('2009-01-01',NULL,'2010-01-01',40,300),
('2010-01-01','2009-01-01', NULL,40,200),
('2009-01-01',NULL, '2010-01-01',20,100),
('2010-01-01','2009-01-01', NULL,20,50),
('2009-01-01',NULL,'2010-01-01',30,10)
GO
Each entry has a dteEffectiveDate field. In addition, each has a dtePrevious and dteNext, which reflects the dates of the nearest previous/next effective date. Now what I want is a query that will calculate the mid-value on the Count fields between successive periods, within a specific age.
So for example, in the data above, for age 40 we have 300 at 2009/01/01 and 200 at 2010/01/01 so the query should produce 250.
Note that age 30 has only one entry, 10. This is at 2009/01/01. There is no entry at 2010/01/01, but we know that data was captured at this point, so the fact that there is nothing means that 30 is 0 at this date. Hence the query should produce 5.
In order to achieve this I use a FULL JOIN of the table on itself, and use ISNULL to select values. Here is my code:
SELECT
ISNULL(T1.dteEffectiveDate,T2.dtePrevious) as [Start Date]
,ISNULL(T1.dteNext,T2.dteEffectiveDate) as [End Date]
,ISNULL(T1.Age,T2.Age) as Age
,ISNULL(T1.[Count],0) as [Count Start]
,ISNULL(T2.[Count],0) as [Count End]
,(ISNULL(T1.[Count],0)+ISNULL(T2.[Count],0))/2 as [Mid Count]
FROM
[ExpDBClient].[dbo].[dataTable] as T1
FULL JOIN [ExpDBClient].[dbo].[dataTable] as T2
ON
T2.dteEffectiveDate = T1.dteNext
AND T2.Age = T1.Age
WHERE ISNULL(T1.dteEffectiveDate,T2.dtePrevious) is not null
AND ISNULL(T1.dteNext,T2.dteEffectiveDate) is not null
GO
which outputs:
Start Date End Date Age Count Start Count End Mid Lives
2009-01-01 2010-01-01 40 300 200 250
2009-01-01 2010-01-01 20 100 50 75
2009-01-01 2010-01-01 30 10 0 5
It works perfectly, but when I run this on the actual data, which is about 7m records, it takes painfully long to execute.
Does anyone have any suggestions?
Thanks
Karl