In SQL Server 2005+
:
SELECT oo.*
FROM (
SELECT DISTINCT ProductId
FROM Orders
) od
CROSS APPLY
(
SELECT TOP 1 ProductID, Date, CustomerID
FROM Orders oi
WHERE oi.ProductID = od.ProductID
ORDER BY
Date DESC
) oo
Nominally, the plan for the query contains Nested Loops
.
However, the outer loop will use a Index Scan
with Stream Aggregate
, and the inner loop will contain an Index Seek
for the ProductID
with a Top
.
In fact, the second operation is almost free, since the index page used in the inner loop will most probably reside in the cache because it had just been used for the outer loop.
Here's the test result on 1,000,000
rows (with 100
DISTINCT
ProductID
's):
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 100)
Table 'Orders'. Scan count 103, logical reads 6020, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 234 ms, elapsed time = 125 ms.
, while this is a result of a mere SELECT DISTINCT
query:
SELECT od.*
FROM (
SELECT DISTINCT ProductId
FROM Orders
) od
And the statistics:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 100)
Table 'Orders'. Scan count 3, logical reads 5648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 250 ms, elapsed time = 125 ms.
As we can see, the performance is the same, and the CROSS APPLY
takes just 400
extra logical reads
(which most probably will never be physical
).
Don't see how it's possible to improve this query anymore.
Also the benefit of this query is that it parallelizes nicely. You may notice that CPU
time is twice the elapsed time
: it's due to parallelization on my old Core Duo
.
A 4-core
CPU
would complete this query in half of that time.
Solutions using window functions do not parallelize:
SELECT od.*
FROM (
SELECT ProductId, Date, CustomerID, ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Date DESC) AS rn
FROM Orders
) od
WHERE rn = 1
, and here are the statistics:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 100)
Table 'Orders'. Scan count 1, logical reads 5123, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 406 ms, elapsed time = 415 ms.