ansaurus

Question

Index over a column with only 5 distinct values - Worth it?

Answer 1

+1 A:

If it is goin to increase in size as you mention

up to 5.000.000 rows

I would recomend creating an index.

astander 2009-12-10 08:59:51

Answer 2

A:

The worst you can do is try it out and see if it makes a difference.

Just an idea, but due to the fact the table size can grow into the millions and you only have a column with 5 different values, would it be worth splitting that huge table down into 5 seperate tables. With each table representing one of those values?

kevchadders 2009-12-10 09:05:38

Im already there (trying it out). See the image :)Splitting into more tables is not a possibility at the moment, and would destroy other queries so I dont think its a good idea

svrist 2009-12-10 10:06:21

If you do not have the Partitioning Option and you're tempted into going down the route of different tables then you should first look at implementing a "partition view" -- it's a bit of an obsolete construct now but was used before Partitioning was available and is pretty effective. In some cases it's more effective than a partitioned table.

David Aldridge 2009-12-10 10:36:45

Answer 3

+5 A:

It depends on a couple of things.

Firstly, the distribution of values. If you only have five distinct values but one of them accounts for 99.9999% of rows in the table then obviously you would not want the optimiser to use the index for that value but you might want it to use it for the others. In some cases like this it's worth using a function-based index to ensure that you only index the values of interest and not the ones that are just taking up space.

Secondly, are there queries that can be answered using that index without accessing the table?

Note that it's not just the percentage of rows that will be accessed that matters, but the number of blocks of the table that will need to be accessed. For example if you have a table of 1000 blocks and 30 rows per block on average, and one column has 30 distinct values (each one being present in 1000 rows), then the number of blocks that need to be visited to read every row for a single value varies between 1000/30=34 (worth using an index) and 1000 (not worth using an index) depending on how the rows are distributed. this is expressed by the clustering factor of the index -- if it's value is close to the number of rows in the table then the index is less likely to be used, and if it's close to the number of blocks then it's more likely to be used.

also, you might look at index compression to see if that saves you space.

Be careful with bitmap indexes -- they are not friendly to systems where they are subject to modification by multiple sessions at the same time (eg. two people both inserting rows at the same time into the indexed table).

A more effective strategy if you do want to improve the efficieny of queries with predicates on these five values is to use partitioning, partly because of partition pruning in the query but also because of the improvement in statistics available to the optimiser when it knows that only one partition will be accessed and can use partition-level statistics instead of global statistics.

David Aldridge 2009-12-10 10:26:45

Answer 4

+1 A:

Possibly the easiest way it not to guess but actually try.

But it seems to me that you're comparing execution plans in order to find the best approach. It's not reliable. Optimizer may not have appropriate information to choose the best plan (for example, if you have a nonuniform distribution of values and haven't got histogram). Also looking at "cost" in explain plan makes no sense.

The better way is to compare logical IOs. Run SQL*Plus, say set autotrace traceonly, then run your query (with and without index) and compare "consistent gets" number. The less the better.

About importance of LIOs: article by Cary Millsap.

egorius 2009-12-10 12:19:52

Answer 5

A:

Test it with typical queries, see which way is faster.

You may find that a Full Table Scan is faster on average than an Index Range Scan + Table Access by Rowid - in which case Oracle got it right.

On the other hand, perhaps there are patterns of data which for most of your queries it's better to use the index - in which case you'll probably want to add the INDEX hint.

Jeffrey Kemp 2009-12-10 14:07:41

Answer 6

+1 A:

The index will be useful in the following cases:

When you search for infrequent FREQUENCYID's. Like, only 10 of your 10,000,000 rows have FREQUENCYID = 1 and you search for it.
When you do not use other columns except FREQUENCYID in your queries. This query:
```
SELECT  FREQUENCYID, COUNT(*)
FROM    mytable
GROUP BY
        FREQUENCYID
```
will benefit from the index (actually, INDEX FAST FULL SCAN along with HASH AGGREGATE will most probably be used)
When your table rows are large and all columns you use in the query are indexed. This way, all indexes will be joined instead of making a FULL TABLE SCAN. Say, this query:
```
SELECT  FREQUENCYID, OTHERCOLUMN
FROM    mytable
WHERE   FREQUENCYID = 2
```
can be performed by joining the values from the indexes on FREQUENCYID and OTHERCOLUMN on ROWID.

Quassnoi 2009-12-10 15:21:17

ansaurus

tags:

views:

answers:

Index over a column with only 5 distinct values - Worth it?

related questions