views:

69

answers:

3

I have created script to find selectivity of each columns for every tables. In those some tables with less than 100 rows but selectivity of column is more than 50%. where Selectivity = Distinct Values / Total Number Rows

So, is those column are eligible for index? Or, can you tell, how much minimum rows require for eligibility for creating index?

+2  A: 

You can create a index on a table with 0 rows, 1 row or a 100 million rows. You can create an index where every column has the same value or unique values.

So you can create an index. The question is really should you create an index and no tool is going to tell you that because indexes can also be multi-value and it depends on what queries you run. Creating indexes is something done when performance tuning queries or preemptively when you know that you'll be creating queries that are using it.

Every index comes with a cost in terms of space and time required to do updates, inserts and deletes. You don't want to be creating them spuriously so you're really going to have to do this by hand, not as a result of a script to see how unique the value of a column is.

cletus
The way you started this post, I really thought you were going Green Eggs and Ham on us. Sadly, I was mistaken. But, good advice--make sure you use the performance tuning features of your RDBMS before creating indices!
Eric
Thank to all for your answer.
Paresh
+4  A: 

I think I understand what you are trying to accomplish by calculating a 'Selectivity' value for your data but you cannot apply the rule blindly.

In fact in for certain queries the 'Selectivity' value might be really low an index will still be very beneficial. For example:
Assume a 'inbox' table with millions of rows, these rows have a 'Read' boolean field. In this case the distinct values over the number of rows will be really low. If most items are read most of the time then finding unread items with an index on this field will be very efficient.

Creating indexes index come at a cost. Although you get the benefit for reads, you pay for writes and disk usage.

I would rather recommend you profile your queries and index accordingly. You can also look at the data from *sys.dm_db_missing_index_group_stats* and other Dynamic management views that will give you insight on indexes usage (or missing) ones.

Philip Fourie
A: 

A general rule of thumb says that if you have a very large table (over 1 million rows), you should only use an index if a WHERE clause based on that index selects at most something in the neighborhood of 1-2% of the data.

If you have a "gender" column and roughly 50% of values are "male" and roughly 50% "female", then having an index on that really doesn't give you much - SQL Server and most other RDBMS will most likely still do a full table scan in this case, since on average, they'd have to scan at least half the table anyway, so the "detour" by using an index first and then looking up the actual full data based on that index value is just not worth it.

An index is excellent if you have something like unique keys (customer number), or a value that is quite selective. An index is not without cost - it uses up disk space, it needs to be maintained, it will slightly slow down all operations besides the SELECT - so thread carefully, it's not the best idea to just blindly index everything. Having too few indices is bad - but having too many, and the wrong ones, can be even worse! :-) Nobody ever claimed getting your indices right was easy.... :-)

But there's definitely help out there - the best source I know are Kimberly Tripp's excellent blog posts on SQL Server indexing (and many other topics).

Marc

marc_s