ansaurus

Question

SQL Statistics and Indexes - How detailed are they?

Answer 1

+1 A:

It creates a histogramm and will thus use that.

With a bit column it will have a good idea how many are 0 and 1

With a string column, it iwll have a rough idea of "bands" (value starting a, b, c etc.). Same for numbers (it creates x bands of value ranges).

Just look up how statistics look in your management studio - you can actually access the histograms.

TomTom 2010-10-22 17:22:04

Answer 2

+3 A:

You can see for yourself:

CREATE TABLE IndexTest (
Id int not null primary key identity(1,1),
Active bit not null default(0),
IndexedValue nvarchar(10) not null
)

CREATE INDEX IndexTestActive ON IndexTest (Active)
CREATE INDEX IndexTestIndexedValue ON IndexTest (IndexedValue)

DECLARE @values table
(
    Id int primary key IDENTITY(1, 1),
    Value nvarchar(10)
)

INSERT INTO @values(Value) VALUES ('1')
INSERT INTO @values(Value) VALUES ('2')
INSERT INTO @values(Value) VALUES ('3')
INSERT INTO @values(Value) VALUES ('4')
INSERT INTO @values(Value) VALUES ('5')
INSERT INTO @values(Value) VALUES ('Many')
INSERT INTO @values(Value) VALUES ('Many')
INSERT INTO @values(Value) VALUES ('Many')
INSERT INTO @values(Value) VALUES ('Many')
INSERT INTO @values(Value) VALUES ('Many')

DECLARE @rowCount int
SET @rowCount = 100000

WHILE(@rowCount > 0)
BEGIN
    DECLARE @valueIndex int
    SET @valueIndex = CAST(RAND() * 10 + 1 as int)
    DECLARE @selectedValue nvarchar(10)
    SELECT @selectedValue = Value FROM @values WHERE Id = @valueIndex
    DECLARE @isActive bit
    SELECT @isActive = CASE 
            WHEN RAND() < 0.001 THEN 1 
            ELSE 0
           END
    INSERT INTO IndexTest(Active, IndexedValue) VALUES (@isActive, @selectedValue)
    SET @rowCount = @rowCount - 1
END

SELECT count(*) FROM IndexTest WHERE Active = 1
SELECT count(*) FROM IndexTest WHERE Active = 0

SELECT count(*) FROM IndexTest WHERE IndexedValue = '1'
SELECT count(*) FROM IndexTest WHERE IndexedValue = 'Many'

It looks to me like it always uses the indexes on this query plan:

query plan

codekaizen 2010-10-22 17:38:38

Thanks. This helps. If I use where Active=0 and IndexedValue='1' it will do a table scan whereas Active=1 and IndexValue='1' will use the index.

sgmoore 2010-10-22 18:09:57

Answer 3

+1 A:

You can simply look at the statistics and see for yourself :) DBCC SHOW_STATISTICS. See the Remarks section, it has a nice explanation of how the histograms are actually stored and used:

To create the histogram, the query optimizer sorts the column values, computes the number of values that match each distinct column value and then aggregates the column values into a maximum of 200 contiguous histogram steps. Each step includes a range of column values followed by an upper bound column value. The range includes all possible column values between boundary values, excluding the boundary values themselves. The lowest of the sorted column values is the upper boundary value for the first histogram step.

alt text

For each histogram step:

Bold line represents the upper boundary value (RANGE_HI_KEY) and the number of times it occurs (EQ_ROWS)

Solid area left of RANGE_HI_KEY represents the range of column values and the average number of times each column value occurs (AVG_RANGE_ROWS). The AVG_RANGE_ROWS for the first histogram step is always 0.

Dotted lines represent the sampled values used to estimate total number of distinct values in the range (DISTINCT_RANGE_ROWS) and total number of values in the range (RANGE_ROWS). The query optimizer uses RANGE_ROWS and DISTINCT_RANGE_ROWS to compute AVG_RANGE_ROWS and does not store the sampled values.

The query optimizer defines the histogram steps according to their statistical significance. It uses a maximum difference algorithm to minimize the number of steps in the histogram while maximizing the difference between the boundary values. The maximum number of steps is 200. The number of histogram steps can be fewer than the number of distinct values, even for columns with fewer than 200 boundary points. For example, a column with 100 distinct values can have a histogram with fewer than 100 boundary points.

Remus Rusanu 2010-10-22 18:03:10

ansaurus

tags:

views:

answers:

SQL Statistics and Indexes - How detailed are they?

related questions