views:

311

answers:

3

A colleague asked me to explain how indexes (indices?) boost up performance; I tried to do so, but got confused myself.
I used the model below for explanation (an error/diagnostics logging database). It consists of three tables:

  • List of business systems, table "System" containing their names
  • List of different types of traces, table "TraceTypes", defining what kinds of error messages can be logged
  • Actual trace messages, having foreign keys from System and TraceTypes tables

I used MySQL for the demo, however I don't recall the table types I used. I think it was InnoDB.

 System                                TraceTypes
-----------------------------         ------------------------------------------
| ID          | Name        |         | ID    | Code   | Description           |
-----------------------------         ------------------------------------------
| 1           | billing     |         | 1     | Info   | Informational mesage  |
| 2           | hr          |         | 2     | Warning| Warning only          |
-----------------------------         | 3     | Error  | Failure               |
           |                          ------------------------------------------
           |                ------------|
 Traces    |                |            
 --------------------------------------------------
 | ID | System_ID | TraceTypes_ID | Message       |
 --------------------------------------------------
 | 1  |  1        |  1            | Job starting  |
 | 2  |  1        |  3            | System.nullr..|
 --------------------------------------------------

First, i added some records to all of the tables and demonstrated that the query below executes in 0.005 seconds:

select count(*) from Traces 
  inner join System on Traces.System_ID = System.ID
  inner join TraceTypes on Traces.TraceTypes_ID = TraceTypes.ID
where 
  System.Name='billing' and TraceTypes.Code = 'Info'

Then I generated more data (no indexes yet)

  • "System" contained about 100 entries
  • "TraceTypes" contained about 50 entries
  • "Traces" contained ~10 million records.

Now the previous query took 8-10 seconds.

I created indexes on Traces.System_ID column and Traces.TraceTypes_ID column. Now this query executed in milliseconds:

select count(*) from Traces where System_id=1 and TraceTypes_ID=1;

This was also fast:

select count(*) from Traces 
  inner join System on Traces.System_ID = System.ID
where System.Name='billing' and TraceTypes_ID=1;

but the previous query which joined all the three tables still took 8-10 seconds to complete.

Only when I created a compound index (both System_ID and TraceTypes_ID columns included in index), the speed went down to milliseconds.

The basic statement I was taught earlier is "all the columns you use for join-ing, must be indexed".
However, in my scenario I had indexes on both System_ID and TraceTypes_ID, however MySQL didn't use them. The question is - why? My bets is - the item count ratio 100:10,000,000:50 makes the single-column indexes too large to be used. But is it true?

A: 

My guess would be that it would be using the index and then it might be using traditional look up to move to another index and then filter out. Please check the execution plan. So in short you might be looping through two indexes in nested loop. As per my understanding. We should try to make a composite index on column which are in filtering or in join and then we should use Include clause for the columns which are in select. I have never worked in MySql so my this understanding is based on SQL Server 2005.

Nitin Midha
+1  A: 

It's not the size of the indexes so much as the selectivity that determines whether the optimizer will use them.

Mitch Wheat
A: 

First, the correct, and the easiest, way to analyze a slow SQL statement is to do EXPLAIN. Find out how the optimizer chose its plan and ponder on why and how to improve that. I'd suggest to study the EXPLAIN results with only 2 separate indexes to see how mysql execute your statement.

I'm not very familiar with MySQL, but it seems that there's restriction of MySQL 4 of using only one index per table involved in a query. There seems to be improvements on this since MySQL 5 (index merge), but I'm not sure whether it applies to your case. Again, EXPLAIN should tell you the truth.

Even with using 2 indexes per table allowed (MySQL 5), using 2 separate indexes is generally slower than compound index. Using 2 separate indexes requires index merge step, compared to the single pass of using a compound index.

Multi Column indexes vs Index Merge might be helpful, which uses MySQL 5.4.2.

bryantsai
tahnk you, i never knew of such "one index per table" rule, but it seems to be logical and also - to explain my problem (I was in mysql5.4.something, though).
naivists