views:

886

answers:

2

We have a transact sql statement that queries 4 tables with millions of rows in each.

It takes several minutes, even though it has been optimized with indexes and statistics according to TuningAdvisor.

The structure of the query is like:

SELECT E.EmployeeName
    , SUM(M.Amount) AS TotalAmount
    , SUM(B.Amount) AS BudgetAmount
    , SUM(T.Hours) AS TotalHours
    , SUM(TB.Hours) AS BudgetHours
    , SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours
FROM Employees E
LEFT JOIN MoneyTransactions M
    ON E.EmployeeID = M.EmployeeID
LEFT JOIN BudgetTransactions B
    ON E.EmployeeID = B.EmployeeID
LEFT JOIN TimeTransactions T
    ON E.EmployeeID = T.EmployeeID
LEFT JOIN TimeBudgetTransactions TB
    ON E.EmployeeID = TB.EmployeeID
GROUP BY E.EmployeeName

Since each transaction table contains millions of rows, I consider splitting it up into one query per transaction table, using table variables like @real, @budget, and @hours, and then joining these in a final SELECT. But in tests it seems to not speed up.

How would you deal with that in order to speed it up?

+7  A: 

I'm not sure the query you posted will yield the results you're expecting.

It will cross join all the dimension tables (MoneyTransactions etc.) and multiply all the results.

Try this:

SELECT  E.EmployeeName,
        (
        SELECT  SUM(amount)
        FROM    MoneyTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS TotalAmount,
        (
        SELECT  SUM(amount)
        FROM    BudgetTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS BudgetAmount,
        (
        SELECT  SUM(hours)
        FROM    TimeTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS TotalHours,
        (
        SELECT  SUM(hours)
        FROM    TimeBudgetTransactions m
        WHERE   M.EmployeeID = E.EmployeeID
        ) AS BudgetHours
FROM    Employees E
Quassnoi
Hmmm... Isn't that SELECT EmployeeID, EmployeeName, SUM(...), SUM(...) FROM Employees GROUP BY EmployeeID, EmployeeName?
Tomalak
Why group on primary key?
Quassnoi
That's a mistake (sloppy reading). Never mind, my bad. +1
Tomalak
@Quassnoi: Thanks. I thought nested queries (SELECTs inside SELECT) would be slower than JOINs... Have not tried your suggestion yet...
Ole Lynge
No, they won't. Actually, the query you posted is slow because it makes lots of unnesessary joins and produces the incorrect results. If there are 100 rows per employee in each transaction table, you'll get 10,000,000 rows for each employee in a result, which is most probably not what you want.
Quassnoi
Thanks. Of course. I must be bombed after lunch...
Ole Lynge
+1  A: 

I don't know if you have all the indexes on your tables that will speed up things, but having big tables could have this impact on a query time. I would recommend partitioning the tables if possible. It is more work, but everything you do to speed up the query now it won't be enough after few millions new records.

Bojan Milenkoski