views:

77

answers:

3

I have created script to find selectivity of each columns for every tables. In those some tables with less than 100 rows but selectivity of column is more than 50%. where Selectivity = Distinct Values / Total Number Rows So, are those column are eligible for index? Or, can you tell, how much minimum rows require for eligibility to create index?

+3  A: 

You can index on any column - the question is whether it makes any sense and whether that index will be used....

Typically, a selectivity of less than 1-5% might work - the smaller that percentage, the better. The best is single values out of a large population, e.g. a single customer ID out of hundreds of thousands - those indices will definitely be used.

Things like gender (only 2 values) or other things that only have a very limited number of possible values typically don't work well on an index. At least on their own - these columns might be ok to be included into another index as a second or third column.

But really, the only way to find out whether or not an index makes sense is to

  • measure your queries before
  • create the index
  • run your queries again, check their execution plans, measure their timings

There's no golden rule as to when an index will be used (or ignored) - too many variables play into that decision.

For some expert advice on how to deal with indices, and how to find out which indices might not get used, and when it makes sense to create an index, see Kimberly Tripp's blog posts:

marc_s
I have table with 3 interger values and all are dostinct .The selectivity of that is more than 95%. And this table is mostly used with select statement only. So is it physible to create index on that?
Paresh
95% selective meaning? TYpically, you want a very low selectivity - you want that a single value (ID = 55) only select a minimal amount of rows. If your selectivity in that scenario (percentage of how many rows out of the total will be selected for a given value of your field) is under 5% or better even under 1%, then it definitely makes sense to index.
marc_s
A: 

I'm not sure about sql-server, but most DBMS don't use an index for retrieval if it can retrieve all of the table rows in a single I/O. You will see this on PLAN explanations, some tables are always tablespace scanned.

IMHO, any table with less than 5000 rows is not worth analysing for cardinality if the DBMS is running on a server.

Steve De Caux
+1  A: 

Most DBMS use a cache for data and code (stored procedure, execution plan, etc.). In SQL Server I think it's called the data and procedure cache, and in Oracle, it's called the buffer cache and the SGA. Table data and/or index can be in the cache.

Small table which are frequently accessed will most likely fit in the cache. But the table can be evicted from the cache, say, if a query load fresh data from the disk. There are options to indicate that you want a table to be permanently in the cache (See PINTABLE). That's maybe a better strategy that using an index if your table is very small (which is your case). Adding an index (which would also always be in the cache) could help further, but I don't know what would be the gain.

The big different in performance is disk access vs. memory access. Purpose of index is to reduce the amount of data to read from the disk, but if it's already in memory, gain is probably small.

ewernli