ansaurus

Question

How does SQL server work out the estimated number of rows?

Answer 1

A:

Since you already updated the statistics, I'd try to eliminate any parameter sniffing:

CREATE PROCEDURE xyz
(
    @param1 int
    ,@param2 varchar(10)

)AS

DECLARE @param_1 int
       ,@param_2 varchar(10)

SELECT @param_1=@param1
      ,@param_2=@param2

...complex query here....
...WHERE column1=@param_1 AND column2=@param_2....

go

KM 2009-09-25 11:17:29

Answer 2

+1 A:

It uses statistics, which it keeps for each index.

(You can also create statistics on non-indexed columns)

To update all your statistics on every table in a Database (WARNING: will take some time on very large databases. Don't do this on Production servers without checking with your DBA...):

exec sp_msforeachtable 'UPDATE STATISTICS ?'

If you don't have a regular scheduled job to rebuild your most active indexes (i.e. lots of INSERTS or DELETES), you should consider rebuilding your indexes (same caveat as above applies):

exec sp_msforeachtable "DBCC DBREINDEX('?')"

Statistics Used by the Query Optimizer in Microsoft SQL Server 2008

Mitch Wheat 2009-09-25 11:17:39

Answer 3

+3 A:

SQL Server splits each index into up to 200 ranges with the following data (from here):

RANGE_HI_KEY

A key value showing the upper boundary of a histogram step.

RANGE_ROWS

Specifies how many rows are inside the range (they are smaller than this RANGE_HI_KEY, but bigger than the previous smaller RANGE_HI_KEY).

EQ_ROWS

Specifies how many rows are exactly equal to RANGE_HI_KEY.

AVG_RANGE_ROWS

Average number of rows per distinct value inside the range.

DISTINCT_RANGE_ROWS

Specifies how many distinct key values are inside this range (not including the previous key before RANGE_HI_KEY and RANGE_HI_KEY itself);

Usually, most populated values go into RANGE_HI_KEY.

However, they can get into the range and this can lead to the skew in distribution.

Imagine these data (among the others):

Key value Count of rows

1          1
2          1
3          10000
4          1

SQL Server usually builds two ranges: 1 to 3 and 4 to the next populated value, which makes these statistics:

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
3             2           10000    1               2

, which means the when searching for, say, 2, there is but 1 row and it's better to use the index access.

But if 3 goes inside the range, the statistics are these:

RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
4             10002       1        3334            3

The optimizer thinks there are 3334 rows for the key 2 and index access is too expensive.

Quassnoi 2009-09-25 11:21:33

Thanks - very helpful post and link.

Kragen 2009-09-25 18:44:42

How can we solve this problem even if updating statistics with full scan couldn't solve it?

Maysam 2010-01-31 10:14:01

@Maysam: you can use `CREATE STATISTICS` for the predicates you use often.

Quassnoi 2010-01-31 20:48:34

Answer 4

A:

rebuilding your indexes might resolve the incorrect estimated rows value issue

2010-08-04 18:57:25

ansaurus

tags:

views:

answers:

How does SQL server work out the estimated number of rows?

related questions