views:

96

answers:

2

When will the below be necessary:

create index i_t_a_b on t(a,b);

create index i_t_b_a on t(b,a);
+2  A: 

When you want the maximum retrieval speed and have both columns in the join or where conditions, BUT sometimes column a has higher selectivity and sometimes column b has higher selectivity, and you want to capitalize on that fact from single index.

Also I think your ratio of data size / performance of the machine should be quite high and at the same time you will have to (guesstimating) be willing to call any improvement a necessity (even if only by a few percentages).

Still, experience teaches that things depend on lot of factors; with specific RDBMS and application environments you better run your own benchmarks.

EDIT: Further explanation on composite indexes. from wikipedia:
"The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column.
For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If you are given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it would be very tedious to find all the phone numbers for a given last name. You would have to look within each city's section for the entries with that last name."

Wikipedia's explanations is maybe overly simplified, but it gives you the basic idea (as analogies go keep in mind that phone books usually have clustered indexes and that wouldn't be your general database index).

Depending on the size of the index vs size of the data structure vs available memory vs selectivity on the first column of the index it still might be much less expensive to use wrongly ordered index then to use table scans.

Ah, just thought of a better analogy with an example you are looking for Imagine a nice textbook, it would have table of contents with chapters and subchapter and number of the pages at which they are at (which is a non clustered index which hold pointers to data records - pages). Now imagine that the textbook is on SQL-92 standard, then most of the terms in TOC would be SQL terms (do hold this assumption). You would also have another index at the end of the book which would list all the interesting terms in alphabetical orders (let's assume with major chapter names) and page numbers.

For question such as 'Tell me all the chapters under which DISTINCT appears' you would use the second index. (because the selectivity of the later field is high)

For question such as 'Tell me the number of the terms that appear under first chapter' you would use the TOC

So for questions such as 'Is SELECT described under DML chapter?' you might use either of the indexes. (because selectivity of both fields is high) However if TOC of DML itself is 3 pages long and the SELECT entry in the index has only fifteen lines you would probably go to the second one, and that is an example of when you benefit from both indexes.

Now, if you think that's too far fetched do take a database of the scanned library of congress into consideration. :)

As I said before, all the planning is fine, but at the end do run your own benchmarks.

Unreason
+1: Nice explanation. Feel free to up-vote my answer too - in case you agree :)
Peter Lang
+1  A: 

I don't think there is any real case where you need that.

It could make sense when your table has a lot more columns, a and b are not unique, and you need high performance with both of the following queries:

Select Max(b) From t Where a=1  --# Would use i_t_a_b

and

Select Max(a) From t Where b=1  --# Would use i_t_b_a

Let's say your table looks like this:

a  b  c  d  e
-  -  -  -  -
0  8  x  x  x
0  9  x  x  x
1  8  x  x  x
1  9  x  x  x

i_t_a_b looks something like this:

0
  8
  9
1
  8
  9

i_t_b_a looks something like this:

8
  0
  1
9
  0
  1

Select Max(b) From t Where a=1

would have to dig into 8 and 9 of i_t_b_a to find all rows with a=1. This is still much faster than a full-table scan (having to read all the x too), but it is not as fast as using i_t_a_b.

Peter Lang
I made a test and find i_t_a_b can also be used for `b=1` and vice versa
symfony
@symfony: yes it can be used, it is better then doing a full scan of the table, but for b=1 i_t_b_a performs better then i_t_a_b
Unreason
Can you give some analysis about this?Though intuitively it sounds reasonable
symfony
@symfony: I tried to add a simplified explanation of what I mean.
Peter Lang
I don't think it matters,because MySQL should be clever enough to decide which side of the index to scan from
symfony
This has nothing to do with being clever. MySQL is clever enough to use the index, but the way the index is built makes it impossible to start scanning *from the other side*.
Peter Lang
shamelessly pointing to my explanation :) http://stackoverflow.com/questions/2500440/does-compound-index-have-direction-in-mysql/2500576#2500576
Unreason
How can it be faster than full-table scan if it's the second indexed column if MySQL is impossible to start scanning from the other side?
symfony
@symfony, if you study the example index layouts that @Peter_Lang has added to his answer, I think you'll see the difference between the two compound indexes and be able to reason the use for each. By the way, if you replace all VCHAR columns with CHAR columns, table scans are faster because all the records are the same length. MySQL knows this and can more quickly scan the table. It won't have to read every column to skip to the next record.
Marcus Adams
I'm aware of the difference.Maybe you should read my question more carefully?The keyword is **second indexed column**
symfony
@symfony: It's faster since the index takes less space than the full table (less to read) **and** since the second column is still indexed. MySQL needs to go through all available values of `a`, but within them, column `b` is indexed so it's still faster to find the values you need.
Peter Lang
@Peter +1 (you could expand the example of indexes and clarify the fact that it is not possible to access values b directly in i_t_a_b, but only values of a or (a,b))
Unreason