ansaurus

Question

Performing Aggregate Functions on Multi-Million Row Tables

Answer 1

+1 A:

I've not yet read through your whole question (I'll come to that shortly) but t answer an early comment: you can use partitioned views in SQL Server 2008 standard edition. It's partitioned tables (which are admitable more flexible) that are restricted to Enterprise edition.

Paritioned views info: http://msdn.microsoft.com/en-us/library/ms190019.aspx

On the wider question I'd like to know if you really need the DENSE_RANK in there. I'm wondering if you're confused between the ORDER BY inside the DENSE_RANK and the ORDER BY of the query itself. As it stands your TOP 5 will return 5 undefined records since SQL Server des not guarantee any order on records unless an ORDER BY clause is specified (which you've not done). If you move the ORDER BY from the DENSE_RANK down to be the whole query ORDER BY as follows, the records will come out as I think you want and it will remove the need for the expensive DENSE_RANK aggregate function.

SELECT TOP 5
    SUM([LogCount]) AS [Views],
    [Inv_ID]
FROM [LogInvSearches_Daily] D (NOLOCK)
WHERE 
    [LogDay] > DateAdd(d, -30, getdate())
    AND EXISTS(
        SELECT *
        FROM Inventory (NOLOCK)
        WHERE Acct_ID = 18731
            AND Inv_ID = D.Inv_ID
    )
GROUP BY
    Inv_ID
ORDER BY
    [Views] DESC,
    [Inv_ID]

UPDATE:

The time is probably being used up here:

|--Sort(ORDER BY:([D].[Inv_ID] ASC))

You could try creating a covering index like this one:

CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_Perf] ON [dbo].[LogInvSearches_Daily] 
(
    [Inv_ID] ASC,
    [LogDay] ASC
)
INCLUDE
(
    [LogCount]
)

Note that I've also atered the ORDER BY slightly (Inv_ID is now sorted ASC instead of DESC). I suspect this change won't affect the results in a problematic way butmay help performance since it will be returning rows inthe same order that they are gouped (although ths may be irrelevant!).

Daniel Renshaw 2010-05-12 16:33:22

DENSE_RANK() or not, the results are still just as slow. I've tried it both ways, and I still can't get this to load any faster than 24 seconds. Updated to show query plan and time for the same query without the DENSE_RANK()

Daniel Short 2010-05-12 18:14:09

I've updated my answer with an index suggestion

Daniel Renshaw 2010-05-12 18:33:13

I think that index will do the trick. Now I just have to figure out how to get the index created without bringing down the entire server...

Daniel Short 2010-05-13 15:47:29

Answer 2

A:

Partitioning aside,

Based on our experience with larger table than yours, we extract data into a temp table (not table variable) and aggregate on that. Not for all queries, but the more complex ones.

Other than that, I agree with Daniel Renshaw's observations about DENSE_RANK

I'd also think about moving [Inv_ID], [LogCount] into the index (not include, perhaps with a DESC sort)

gbn 2010-05-12 17:03:50

Well this is the aggregate table... We have a ms by ms table, and then this rolls all of those requests up into days. Which I'm now attempting to query. I can't break it down by more than this, since these will be dynamic queries run by users on demand for their accounts.

Daniel Short 2010-05-12 18:21:53

Answer 3

A:

Acct_ID is on the Inventory table, and seems to have an index to itself (IX_Inventory_Acct_ID). Perhaps if Inventory had an index on (Acct_Id, Inv_Id) and LogInvSearches_Daily was clustered (or at least indexed) around (Inv_Id, LogDay), you'd have more luck.

BTW, I've no idea what the current clustering index on LogInvSearches_Daily.ID is supposed to be buying you. Why is it importing to have records with close IDs close on the disk?

Matthew Flynn 2010-05-12 19:43:21

ansaurus

tags:

views:

answers:

Performing Aggregate Functions on Multi-Million Row Tables

related questions