views:

96

answers:

1

How does SQL Server determine whether a table column has low cardinality?

The reason I ask is because query optimizer would most probably not use an index on a gender column (values 'm' and 'f'). However how would it determine the cardinality of the gender column to come to that decision?

On top of this, if in the unlikely event that I had a million entries in my table and only one entry in the gender column was 'm', would SQL server be able to determine this and use the index to retrieve that single row? Or would it just know there are only 2 distinct values in the column and not use the index?

I appreciate the above discusses some poor db design, but I'm just trying to understand how query optimizer comes to its decisions.

Many thanks.

+1  A: 

See Statistics Used by the Query Optimizer in Microsoft SQL Server 2005 .

With 1 value 'm' and 999999 'f' the statistics will give a cardinality estimate of 1 for 'm', and something close to 1M for 'f'. But that whether the index will be used or not, there are more factors.

In general such a low selectivity column does not make sense on an index alone. However, it does make sense as a leftmost column on a more complex index, and even as a leftmost column on the clustered index. And even if a column would make sense for 'm' and not for 'f', the query auto-parametrization may play a trick on you and generate a plan for a variable @gender instead.

You'll have to either read more or give more details. Some good resources are the QO team and team members blogs:

Remus Rusanu
Cardinality or selectivity? I always thought cardinality was the numbers in a relationship, e.g. a 0:m, or a 0:1 relationship between tables - that's cardinality. For indices, I have always heard the term "selectivity" - how many rows will a given value select (approx.)
marc_s
"The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan." http://msdn.microsoft.com/en-us/library/ms181034.aspx "Selectivity is a property that relates to how many rows are typically identified by a key value" http://msdn.microsoft.com/en-us/library/aa174544%28SQL.80%29.aspx
Remus Rusanu
Afaik. selectivity is a property of the index (ie. how many distinct values exist in the key) vs. cardinality being the result of executing a plan snippet (ie. how many rows will a WHERE clause return)
Remus Rusanu
But you are right that the 'cardinality' term is also used in data modeling to refer to relations, http://en.wikipedia.org/wiki/Cardinality_%28data_modeling%29. But is also used to refer to the uniqueness of a column http://en.wikipedia.org/wiki/Cardinality_%28SQL_statements%29. In QO articles will usually mean the later.
Remus Rusanu