views:

35

answers:

1

I'm encountering a strange behavior of MySQL. Query execution (i.e. the usage of indexes as shown by explain [QUERY]) and time needed for execution are dependent on the elements of the where clause.

Here is a query where the problem occurs:

select distinct
e1.idx, el1.idx, r1.fk_cat, r2.fk_cat
from ent e1, ent_leng el1, rel_c r1, _tax_c t1, rel_c r2, _tax_c t2
where el1.fk_ent=e1.idx
and r1.fk_ent=e1.idx and ((r1.fk_cat=43) or (r1.fk_cat=t1.fk_cat1 and t1.fk_cat2=43))
and r2.fk_ent=e1.idx and ((r2.fk_cat=10) or (r2.fk_cat=t2.fk_cat1 and t2.fk_cat2=10))

The corresponding explain output is:

| id | select_type | table | type   | possible_keys           | key     | key_len | ref           | rows  | Extra                       
+----+-------------+-------+--------+-------------------------+---------+---------+---------------+-------+------------------------------------
|  1 | SIMPLE      | el1   | index  | fk_ent                  | fk_ent  | 4       | NULL          | 15002 | Using index; Using temporary
|  1 | SIMPLE      | e1    | eq_ref | PRIMARY                 | PRIMARY | 4       | DB.el1.fk_ent |     1 | Using index
|  1 | SIMPLE      | r1    | ref    | fk_ent,fk_cat,fks       | fks     | 4       | DB.e1.idx     |     1 | Using where; Using index
|  1 | SIMPLE      | r2    | ref    | fk_ent,fk_cat,fks       | fks     | 4       | DB.el1.fk_ent |     1 | Using index
|  1 | SIMPLE      | t1    | index  | fk_cat1,fk_cat2,fk_cats | fk_cats | 8       | NULL          |    69 | Using where; Using index; Distinct; 
|    |             |       |        |                         |         |         |               |       | Using join buffer
|  1 | SIMPLE      | t2    | index  | fk_cat1,fk_cat2,fk_cats | fk_cats | 8       | NULL          |    69 | Using where; Using index; Distinct;  
                                                                                                          | Using join buffer

As you can see a one-column index has the same name as the column it belongs to. I also added some useless indexes along with the used ones, just to see if they change the execution (which they don't).

The execution takes ~4.5 seconds.

When I add the column entl1.name to the select part (nothing else changed), the index fk_ent in el1 cannot be used any more:

| id | select_type | table | type   | possible_keys           | key     | key_len | ref           | rows  | Extra                       
+----+-------------+-------+--------+-------------------------+---------+---------+---------------+-------+------------------------------------
|  1 | SIMPLE      | el1   | ALL    | fk_ent                  |  NULL   | NULL    | NULL          | 15002 | Using temporary

The execution now takes ~8.5 seconds.

I always thought that the select part of a query does not influence the usage of indexes by the engine and doesn't affect performance in such a way.

Leaving out the attribute isn't a solution, and there are even more attributes that i have to select. Even worse, the query in the used form is even a bit more complex and that makes the performance issue a big problem.

So my questions are: 1) What is the reason for this strange behavior? 2) How can I solve the performance problem?

Thanks for your help! Gred

+2  A: 

It's the DISTINCT restriction. You can think of that as another WHERE restriction. When you change the select list, you are really changing the WHERE clause for the DISTINCT restriction, and now the optimizer decides that it has to do a table scan anyway, so it might as well not use your index.

EDIT:

Not sure if this helps, but if I am understanding your data correctly, I think you can get rid of the DISTINCT restriction like this:

select
e1.idx, el1.idx, r1.fk_cat, r2.fk_cat
from ent e1
  Inner Join ent_leng el1 ON el1.fk_ent=e1.idx
  Inner Join rel_c r1 ON r1.fk_ent=e1.idx
  Inner Join rel_c r2 ON r2.fk_ent=e1.idx
where 
 ((r1.fk_cat=43) or Exists(Select 1 From _tax_c t1 Where r1.fk_cat=t1.fk_cat1 and t1.fk_cat2=43)) 
 and 
 ((r2.fk_cat=10) or Exists(Select 1 From _tax_c t2 Where r2.fk_cat=t2.fk_cat1 and t2.fk_cat2=10))
Bill
Sounds reasonable. But isn't a solution to the problem (I'd either have to deal with a huge, quasi redundant result set or use a complex index). And, strangely, removing DISTINCT doesn't change the EXPLAIN output nor does it speed up the query, as I just checked.
GredPapp
Just tested: Your query seems to yield the same results as mine - and it is much faster! Thanks! I think I have to dive deeper into SQL to use it appropriately.
GredPapp