ansaurus

Question

How do aggregates (group by) work on SQL Server?

Answer 1

+9 A:

Usually it's a Stream Aggregate or a Hash Aggregate.

Stream aggregate sorts the resultset, scans it and returns every new value (not equal to the last in scan). It allows to keep but one set of the aggregate state variables.
Hash aggregate builds a hash table from the resultset. Each entry keeps the aggregate state variables which are initialized on hash miss and updated on hash hit.

Let's see how AVG works. It needs two state variables: sum and count

grouper  value
1        4
1        3
2        8
1        7
2        1
1        2
2        6
2        3

Stream Aggregate

First, it needs to sort the values:

grouper  value
1        4
1        3
1        7
1        2
2        8
2        1
2        6
2        3

Then, it keeps one set of state variables, initialized to 0, and scans the sorted resultset:

grouper  value  sum   count
-- Entered 
-- Variables:    0    0
1        4       4    1
1        3       7    2
1        7      14    3
1        2      16    4
-- Group change. Return the result and reinitialize the variables
-- Returning 1, 4
-- Variables:    0    0
2        8       8    1
2        1       9    2
2        6      15    3
2        3      18    4
-- Group change. Return the result and reinitialize the variables
-- Returning 2, 4.5
-- End

Hash aggregate

Just scanning the values and keeping the state variables in the hash table:

grouper  value
-- Hash miss. Adding new entry to the hash table
--             [1] (0,  0)
-- ... and updating it:
1        4     [1] (4,  1)
-- Hash hit. Updating the entry:
1        3     [1] (7,  2)
-- Hash miss. Adding new entry to the hash table
--             [1] (7,  2)  [2] (0, 0)
-- ... and updating it:
2        8     [1] (7,  2)  [2] (8,  1)
1        7     [1] (14, 3)  [2] (8,  1)
2        1     [1] (14, 3)  [2] (9,  2)
1        2     [1] (16, 4)  [2] (9,  2)
2        6     [1] (16, 4)  [2] (15, 3)
2        3     [1] (16, 4)  [2] (18, 4)
-- Scanning the hash table and returning the aggregated values
-- 1   4
-- 2   4.5

Usually, sort is faster if the resultset is already ordered (like, the values come out of the index or a resultset sorted by the previous operation).

Hash is faster is the resultset is not sorted (hashing is faster than sorting).

MIN and MAX are special cases, since they don't require scanning the whole group: only the first and the last value of the aggregated column within the group.

Unfortunately, SQL Server, unlike most other systems, cannot utilize this efficiently, since it's not good in doing INDEX SKIP SCAN (jumping over distinct index keys).

While simple MAX and MIN (without GROUP BY clause) use a TOP method if the index on the aggregated column is present, MIN and MAX with GROUP BY use same methods as other aggregate functions do.

Quassnoi 2009-09-24 11:45:58

Very very good answer :) Thank you very much.

uzay95 2009-09-24 12:30:32

Answer 2

A:

AS i dont have table available i tried with my custom table PRICE

I defined Primary key = ID_PRICE

Select PRICE.ID_PRICE, Max(PRICE.COSTKVARH) - min(PRICE.COSTKVARH) from PRICE
GROUP BY PRICE.ID_PRICE

Plan:

PLAN (PRICE ORDER PK_PRICE)

Adapted plan:

PLAN (PRICE ORDER PK_PRICE)

In your case p_id is a primary key so the adapted plan will be first order of patientsTable based on p_id then grouping and difference calculation will happen..

Jaswant Agarwal 2009-09-24 12:07:04

ansaurus

tags:

views:

answers:

How do aggregates (group by) work on SQL Server?

Stream Aggregate

Hash aggregate

related questions