views:

296

answers:

6

Hi,

The following simple query takes a very long time (several minutes) to execute.

I have an index: create index IX on [fctWMAUA] (SourceSystemKey, AsAtDateKey)

SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
FROM [fctWMAUA] (NOLOCK) AS [t0]
WHERE SourceSystemKey in (1,2,3,4,5,6,7,8,9)
GROUP BY [t0].[SourceSystemKey]

The statistics are as follows:

  • logical reads 1827978
  • physical reads 1113
  • read aheads 1806459

Taking that exact same query and reformatting it as follows gives me these statistics:

  • logical reads 36
  • physical reads 0
  • read aheads 0

It takes 31ms to execute.

SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 1
 GROUP BY [t0].[SourceSystemKey]
UNION
 SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 2
 GROUP BY [t0].[SourceSystemKey]
UNION
 SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 3
 GROUP BY [t0].[SourceSystemKey]
/* AND SO ON TO 9 */

How do I make an index that does the group by quickly?

Regards Craig.

A: 

Try to tell SQL Server to use the index:

...
FROM [fctWMAUA] (NOLOCK, INDEX(IX)) AS [t0]
...

Make sure the statistics for the table are up to date:

UPDATE STATISTICS [fctWMAUA]

For better answers, turn on the showplan for both queries:

SET SHOWPLAN_TEXT ON

and add the results to your question.

You can also write the query without a GROUP BY. For example, you can use an exclusive LEFT JOIN excluding rows with older dates:

select cur.SourceSystemKey, cur.date
from fctWMAUA cur
left join fctWMAUA next
    on next.SourceSystemKey = next.SourceSystemKey
    and next.date > cur.date
where next.SourceSystemKey is null
and cur.SourceSystemKey in (1,2,3,4,5,6,7,8,9)

This can be surprisingly fast, but I don't think it could beat the UNION.

Andomar
Tried all your suggestions.Still very slow. Union is still fast.<pre> |--Stream Aggregate(GROUP BY:([t0].[SourceSystemKey]) DEFINE:([Expr1003]=MAX([partialagg1004]))) |--Parallelism(Gather Streams, ORDER BY:([t0].[SourceSystemKey] ASC)) |--Stream Aggregate(GROUP BY:([t0].[SourceSystemKey]) DEFINE:([partialagg1004]=MAX([KITE].[dbo].[fctWMAUA].[AsAtDateKey] as [t0].[AsAtDateKey]))) |--Index Seek(OBJECT:([KITE].[dbo].[fctWMAUA].[IX_AsAtDateSourceSystem] AS [t0]), SEEK:([t0].[SourceSystemKey] >= (1) AND [t0].[SourceSystemKey] <= (9)) ORDERED FORWARD)</pre>
Craig
I also re-ordering the fields in the index, and it doesn't change.
Craig
looking at the plan it kinda makes sense.That initial seek is going to find all the records. There are only nine source systems, and it's seeking the lot.
Craig
Does adding "OPTION (HASH GROUP)" or "OPTION (ORDER GROUP)" at the end of the query make any difference?
Andomar
Hi Andomar,Good suggestion. The hash group has brought it down to about fifteen seconds, which is acceptable if I cache the results.It's still odd that I can get 32ms from the union version and nothing close on the group by version. The union version does a seek and a top(1) for each of the queries which is super fast. The index cannot seem to duplicate that.
Craig
One last trick is an exclusive join (added to the answer)
Andomar
I restructured the query to query the source systems and then do an inner query to each of the max dates.This uses the index perfectly, and takes about 7ms.Much faster than group by, and also faster than union.select SourceSystems.SourceSystemKey , (select max(AsAtDateKey) from fctWMAUA where fctWMAUA.SourceSystemKey = SourceSystems.SourceSystemKey group by fctWMAUA.SourceSystemKey) MaxDatafrom SourceSystems
Craig
A: 
 WHERE SourceSystemKey = 3
 GROUP BY [t0].[SourceSystemKey]

You don't need to group by a fixed field.

Any way I prefer the first sentence. May be I will replace the

 WHERE SourceSystemKey in (1,2,3,4,5,6,7,8,9)

for something like

 WHERE SourceSystemKey BETWEEN 1 AND 9

or

 WHERE SourceSystemKey >= 1 AND SourceSystemKey <= 9

if SourceSystemKey is an integer. But I don't think it will cause a big change.

What I will test first is rebuild statistics and rebuild all indexes for the table and wait some time. Rebuilding is not instant, it will depend on how busy is the server but this sentence is well structured for the index be used by the optimizer.

Regards.

j.a.estevan
What do you mean by "You don't need to group by a fixed field" ? He's looking for the maximum date.
Andomar
I tried the between and it didn't change anything.It's using the index, and the initial index seek returns 665 million rows.Using the union it seeks one row (top 1) for each max ordered correctly, and its' super fast.Without the union it seeks 665 million rows and iterates the lot.It's crazy.Both queries definitely using the same indexes in the plan.
Craig
Andomar: I talked about the GROUP BY because if you put "WHERE SourceSystemKey = 3" I can see no sense in "GROUP BY SourceSystemKey" because there is only a SourceSystemKey. There is nothing to group, you are looking for the absolute MAX value that pass the WHERE filter. But any way the optimizer knows it and shouldn't be a problem. (edit: Talking about the second command. In the first case the GROUP BY is OK, obviously)
j.a.estevan
@j.a.estevan: SQL Server requires a GROUP BY before it lets you use aggregates like MAX()
Andomar
It is not requiered in this scenario. In general it is not required if you don't need to group data. Just try (for example): select max(object_id) from sys.tables where name like '%A%'This works fine in SQL Server 2005.
j.a.estevan
You are correct, the group by is not required in the union version of the query. I was copy/pasting the query.It doesn't affect the query plan though.
Craig
@j.a.estevan: Thanks, that'll save me a lot of typing!
Andomar
A: 

Have you tried creating another index just on the SourceSystemKey column? The high number of logical reads when you use that column in your where clause makes me think it is doing an index/table scan. Could you run the execution plan on this and see if that's the case? The execution plan might come up with an index suggestion as well.

Nir
+1  A: 

Its difficult to say without looking at an execution plan, however you might want to try the following:

SELECT * FROM
(
    SELECT MAX(t0.AsAtDateKey) AS [Date], t0.SourceSystemKey AS SourceSystem
    FROM fctWMAUA (NOLOCK) AS t0
    GROUP BY t0.SourceSystemKey
)
WHERE SourceSystem in (1,2,3,4,5,6,7,8,9)

Its difficult to tell without looking at an execution plan, but I think that whats happening is that SQL server is not clever enough to realise that the WHERE clause specified is filtering out the groups, and does not have any effect on the records included for each group. As soon as SQL server realises this its free to use some more inteligent index lookups to work out the maximum values (which is whats happening in your second query)

Just a theory, but it might be worth a try.

Kragen
A: 

Use HAVING instead of WHERE, so that the filtering happens AFTER grouping has occurred:

SELECT MAX(AsAtDateKey) AS [Date], SourceSystemKey AS SourceSystem
FROM fctWMAUA (NOLOCK)
GROUP BY SourceSystemKey
HAVING SourceSystemKey in (1,2,3,4,5,6,7,8,9)

I also don't particularly care for the IN clause, especially when it could be replaced with "<10" or "Between 1 and 9", which are used better by sorted indexes.

BradC
+1  A: 

Hi Guys,

I have found that the best solution is the following. It mimics the union version of the query, and runs very quickly.

40 logical reads, and an execution time of 3ms.

SELECT [t3].[value]
FROM [dimSourceSystem] AS [t0]
OUTER APPLY (
    SELECT MAX([t2].[value]) AS [value]
    FROM (
        SELECT [t1].[AsAtDateKey] AS [value], [t1].[SourceSystemKey]
        FROM [fctWMAUA] AS [t1]
        ) AS [t2]
    WHERE [t2].[SourceSystemKey] = ([t0].[SourceSystemKey])
    ) AS [t3]
Craig