When will the below be necessary:
create index i_t_a_b on t(a,b);
create index i_t_b_a on t(b,a);
When will the below be necessary:
create index i_t_a_b on t(a,b);
create index i_t_b_a on t(b,a);
When you want the maximum retrieval speed and have both columns in the join or where conditions, BUT sometimes column a has higher selectivity and sometimes column b has higher selectivity, and you want to capitalize on that fact from single index.
Also I think your ratio of data size / performance of the machine should be quite high and at the same time you will have to (guesstimating) be willing to call any improvement a necessity (even if only by a few percentages).
Still, experience teaches that things depend on lot of factors; with specific RDBMS and application environments you better run your own benchmarks.
EDIT:
Further explanation on composite indexes.
from wikipedia:
"The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column.
For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If you are given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it would be very tedious to find all the phone numbers for a given last name. You would have to look within each city's section for the entries with that last name."
Wikipedia's explanations is maybe overly simplified, but it gives you the basic idea (as analogies go keep in mind that phone books usually have clustered indexes and that wouldn't be your general database index).
Depending on the size of the index vs size of the data structure vs available memory vs selectivity on the first column of the index it still might be much less expensive to use wrongly ordered index then to use table scans.
Ah, just thought of a better analogy with an example you are looking for Imagine a nice textbook, it would have table of contents with chapters and subchapter and number of the pages at which they are at (which is a non clustered index which hold pointers to data records - pages). Now imagine that the textbook is on SQL-92 standard, then most of the terms in TOC would be SQL terms (do hold this assumption). You would also have another index at the end of the book which would list all the interesting terms in alphabetical orders (let's assume with major chapter names) and page numbers.
For question such as 'Tell me all the chapters under which DISTINCT appears' you would use the second index. (because the selectivity of the later field is high)
For question such as 'Tell me the number of the terms that appear under first chapter' you would use the TOC
So for questions such as 'Is SELECT described under DML chapter?' you might use either of the indexes. (because selectivity of both fields is high) However if TOC of DML itself is 3 pages long and the SELECT entry in the index has only fifteen lines you would probably go to the second one, and that is an example of when you benefit from both indexes.
Now, if you think that's too far fetched do take a database of the scanned library of congress into consideration. :)
As I said before, all the planning is fine, but at the end do run your own benchmarks.
I don't think there is any real case where you need that.
It could make sense when your table has a lot more columns, a
and b
are not unique, and you need high performance with both of the following queries:
Select Max(b) From t Where a=1 --# Would use i_t_a_b
and
Select Max(a) From t Where b=1 --# Would use i_t_b_a
Let's say your table looks like this:
a b c d e
- - - - -
0 8 x x x
0 9 x x x
1 8 x x x
1 9 x x x
i_t_a_b
looks something like this:
0
8
9
1
8
9
i_t_b_a
looks something like this:
8
0
1
9
0
1
Select Max(b) From t Where a=1
would have to dig into 8
and 9
of i_t_b_a
to find all rows with a=1
. This is still much faster than a full-table scan (having to read all the x
too), but it is not as fast as using i_t_a_b
.