ansaurus

Question

MySQL not using index with JOIN, WHERE and ORDER

Answer 1

+3 A:

When I try to reproduce this query using your scripts:

SELECT  A.a, A.b, B.c
FROM    A
JOIN    B
ON      A.b = B.b
WHERE   a = 44
ORDER BY
        c

, it completes in 0.0043 seconds (instantly), returns 930 rows and yields this plan:

1, 'SIMPLE', 'A', 'ref', 'PRIMARY', 'PRIMARY', '4', 'const', 1610, 'Using index; Using temporary; Using filesort'
1, 'SIMPLE', 'B', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.A.b', 1, ''

It's quite efficient for such a query.

For such a query, you cannot use a single index both for filtering and sorting.

See this article in my blog for more detailed explanations:

Choosing index

If you expect your query to return few records, you should use the index on A for filtering and then sort using filesort (like the query above does).

If you expect it to return many records (and LIMIT them), you need to use index for sorting and then filter:

CREATE INDEX ix_a_b ON a (b);
CREATE INDEX ix_b_c ON b (c)

SELECT  *
FROM    B FORCE INDEX (ix_b_c)
JOIN    A
ON      A.b = B.b
ORDER BY
        b.c
LIMIT 10;

1, 'SIMPLE', 'B', 'index', '', 'ix_b_c', '4', '', 2, 'Using index'
1, 'SIMPLE', 'A', 'ref', 'ix_a_b', 'ix_a_b', '4', 'test.B.b', 4, 'Using index'

Quassnoi 2009-08-04 13:50:19

With the real data the record-table is quite big (both in width and in number of rows, with lots of VARCHAR(255):s) and therefor the temporary table costs more as there is alot more data to copy. On our production db (8-core xeon with everything in memory) the query takes about 0.05-0.1s and a MV-test shows sub 0.01s times.

Paso 2009-08-04 13:58:51

I don't get the same query plan as you printed above for the same query. Anyways, the change in ORDER doesnt really help me, sure it removes the filesort but I get the results in the wrong order! Also, just changing the ORDER in the original query to "B.b, B.c" removes the filesort, indicating (well, to me ;)) that it could be possible to do this without a temporary table/filesort. (Funny thing, I actually borrowed the SP for inserting from your blog)

Paso 2009-08-04 15:04:42

@Paso: Sorry, didn't understand your task well. Create an index on `b.c` only and change the `ORDER BY` condition. I'll update it in the post now.

Quassnoi 2009-08-04 15:09:13

@Paso: I noticed :) Glad to hear you read my blog. One little note: when filling an `InnoDB` table in a procedure, always do it in a transaction, it's way faster.

Quassnoi 2009-08-04 15:10:18

@Paso: could you please post which plans do you get when running the queries?

Quassnoi 2009-08-04 15:14:45

Now I actually get the same query execution plan as in your post, must have been something funny with the indexing last time I ran it. However the execution time quickly gets pretty bad as the LIMIT increases so I'm not sure it's relevant for me. Infact I'm not sure I can use a LIMIT at all without some major rewrite of the business logic.

Paso 2009-08-04 16:35:29

@Paso: again, with this layout you cannot filter **and** sort. You need two things here: filter on `B.b` and sort on `B.c`. An index cannot satisfy both: you cannot build a single range ordered both on `b` and on `c`. If you expecting **few** values, you better filter a range using index on `B.b` and the sort the selected values, since sorting few values is fast. Otherwise select ordered data from the index on `B.c` and then just filter them using index on `A.b`. This will require selecting all values from the index on `B.c` in their order, and without a `LIMIT` it will take constant time.

Quassnoi 2009-08-04 16:55:18

Ok, thanks. I guess I will have to go the MV way with the added complexity that means.

Paso 2009-08-04 17:19:17

Answer 2

A:

select A.a, A.b, B.c from A join B on A.b = B.b where a = 44 order by c;

If you alias the columns, does that help? Example:

 SELECT 
 T1.a AS colA, 
 T2.b AS colB, 
 T2.c AS colC 
 FROM A AS T1 
 JOIN B AS T2 
 ON (T1.b = T2.b) 
 WHERE 
 T1.a = 44 
 ORDER BY colC;

The only changes I made were:

I put the join conditions in parenthesis
The join conditions and where conditions are based on table columns
The ORDER BY condition is based on the resulting table column
I aliased the result table columns and the queried tables to (hopefully) make it more clear when I was using one or the other (and more clear to the server. You neglect to refer to your columns in two places in your original query).

I know your real data is more complex, but I assume that you provided a simple version of the query because the problem is at that simple level.

Anthony 2009-08-04 14:15:33

I'm afraid not, your query gives exactly the same EXPLAIN result.

Paso 2009-08-04 14:21:27

Are you actually wanting to join the two tables? What I mean is, do the two tables link up where each row is a full result based on the query, or is more like each row has the data needed from both tables? I ask because if the two tables are not actually tied together in such a way that a join is required, you might consider a UNION instead. With a UNION, the queries are completely independent and thus no sub-queries or temporary tables or anything else taxing needs to happen.

Anthony 2009-08-04 14:31:06

I don't really understand. The tables are JOINed over A.b = B.b and I need the data from B for each A matching a condition, how would a UNION help here? For completeness; no I dont need all the data, only the data from B. See the tag-example at the top of the question, that should explain everything as precisely as I can.

Paso 2009-08-04 14:36:50

ansaurus

tags:

views:

answers:

MySQL not using index with JOIN, WHERE and ORDER

related questions