views:

56

answers:

1

Given the following table:

desc exchange_rates;
+------------------+----------------+------+-----+---------+----------------+
| id               | int(11)        | NO   | PRI | NULL    | auto_increment | 
| time             | datetime       | NO   | MUL | NULL    |                | 
| base_currency    | varchar(3)     | NO   | MUL | NULL    |                | 
| counter_currency | varchar(3)     | NO   | MUL | NULL    |                | 
| rate             | decimal(32,16) | NO   |     | NULL    |                | 
+------------------+----------------+------+-----+---------+----------------+

I have added indexes on time, base_currency and counter_currency, as well as a composite index on (time, base_currency, counter_currency), but I'm seeing a big performance difference when I perform a SELECT using <= against using <.

The first SELECT is:

ExchangeRate Load (95.5ms)   
SELECT * FROM `exchange_rates` WHERE (time <= '2009-12-30 14:42:02' and base_currency = 'GBP' and counter_currency = 'USD') LIMIT 1

As you can see this is taking 95ms.

If I change the query such that I compare time using < rather than <= I see this:

ExchangeRate Load (0.8ms)   
SELECT * FROM `exchange_rates` WHERE (time < '2009-12-30 14:42:02' and base_currency = 'GBP' and counter_currency = 'USD') LIMIT 1

Now it takes less than 1 millisecond, which sounds right to me. Is there a rational explanation for this behaviour?

The output from EXPLAIN provides further details, but I'm not 100% sure how to intepret this:

-- Output from the first, slow, select
SIMPLE      | 5,5     | exchange_rates | 1  | index_exchange_rates_on_time,index_exchange_rates_on_base_currency,index_exchange_rates_on_counter_currency,time_and_currency | index_merge | Using intersect(index_exchange_rates_on_counter_currency,index_exchange_rates_on_base_currency); Using where | 813  |     | index_exchange_rates_on_counter_currency,index_exchange_rates_on_base_currency

-- Output from the second, fast, select
SIMPLE      | 5       | exchange_rates | 1  | index_exchange_rates_on_time,index_exchange_rates_on_base_currency,index_exchange_rates_on_counter_currency,time_and_currency | ref  | Using where | 4988 | const | index_exchange_rates_on_counter_currency

(Note: I'm producing these queries through ActiveRecord (in a Rails app) but these are ultimately the queries which are being executed)

+2  A: 

In the first case, MySQL tries to combine results from all indexes. It fetches all records from both indexes and joins them on the value of the row pointer (table offset in MyISAM, PRIMARY KEY in InnoDB).

In the second case, it just uses a single index, which, considering LIMIT 1, is the best decision.

You need to create a composite index on (base_currency, counter_currency, time) (in this order) for this query to work as fast as possible.

The engine will use the index for filtering on the leading columns (base_currency, counter_currency) and for ordering on the trailing column (time).

It also seems you want to add something like ORDER BY time DESC to your query to get the last exchange rate.

In general, any LIMIT without ORDER BY should ring the bell.

Quassnoi
Hi, thanks but I have actually already added a composite index on (time, base_currency, counter_currency).
Olly
`@Olly`: the columns go in the wrong order. The order matters.
Quassnoi
The only remaining point is, why on earth is the query planner outputting different plans when the <= or < on the time column is insignificant for the index usage. Strange stuff.
Michael
`@Michael`: I've never seen it myself (and cannot reproduce it for now), but seems that most records are exactly within the timestamp passed as a parameter and `@OP` just hit the tipping point where `MySQL` prefers one plan over the other. Anyway, one needs access to actual data to tell exactly.
Quassnoi
`@Quassnoi`: Regarding the order, I thought the composite index had to reflect the order in which the fields appear in the WHERE clause? In my case, time comes first, then base_currency, then counter_currency, hence I created the index using (time, base_currency, counter_currency). Is this not correct?
Olly
`@Olly`: no, this is not correct. With `AND`, the order in the `WHERE` clause does not matter. An index orders the records lexicographically (first on `base_currency`, second on `counter_currency`, finally on `time`). With your `base_currency` and `counter_currency` equifiltered and `time` range filtered, the index can give a single continuous range or records: from `('GBP', 'USD', -INF)` to `('GBP', 'USD', NOW())`. See here for more detailed explanation on composite indexes: http://explainextended.com/2009/05/09/creating-indexes/
Quassnoi
It's the same as with a dictionary. Say, you need the first letter to be `A`, second letter to be `S` and the third letter to be less than `K`. This means all words from `'AS'` to `'ASK'` which are on one or two pages and you can easily read them through. But if you ask for the second and third letter to be `AP` and the first letter to be more than `'A'`, you'll need to read trough the whole dictionary from `'BAP*'` to `'ZAP*'`.
Quassnoi