views:

55

answers:

2

I am struggling (again) with the table from this question: http://stackoverflow.com/questions/1555561/how-to-optimize-this-query-4-mm-tables-involved


It is one main table product_table which has four MM relations via the lookup tables mm1 to mm4. The lookup tables have the fields uid_local containing the uid of the product_table and uid_foreign containing the uid of the categories by which the result gets filtered.

After Quassnois suggestion from the above mentioned question which improved the performance the query looks like this:

SELECT
COUNT(*)
FROM
product_table
WHERE
(product_table.partner_id = 0 OR product_table.partner_id = 15) AND
EXISTS(SELECT NULL FROM mm1 WHERE mm1.uid_local = product_table.uid AND mm1.uid_foreign IN (10, 11, 12, 13, 14, 53)) AND
EXISTS(SELECT NULL FROM mm2 WHERE mm2.uid_local = product_table.uid AND mm2.uid_foreign IN (7, 8, 9)) AND
EXISTS(SELECT NULL FROM mm3 WHERE mm3.uid_local = product_table.uid AND mm3.uid_foreign IN (51 ,52)) AND
EXISTS(SELECT NULL FROM mm4 WHERE mm4.uid_local = product_table.uid AND mm4.uid_foreign IN (15, 16, 17, 18, 19))

This returns somewhat around 10.000 rows as COUNT(*) but it takes > 0.5 seconds for the query to execute. This is too slow.

EXPLAIN looks like this:

id  select_type         table           type  possible_keys           key         key_len   ref       rows    Extra
1   PRIMARY             product_table   ref   NewIndex                NewIndex    4         const     9430    Using where
5   DEPENDENT SUBQUERY  mm1             ref   uid_local,uid_foreign   uid_local   4         mm1.uid   5       Using where
4   DEPENDENT SUBQUERY  mm2             ref   uid_local,uid_foreign   uid_local   4         mm2.uid   2       Using where
3   DEPENDENT SUBQUERY  mm3             ref   uid_local,uid_foreign   uid_local   4         mm3.uid   3       Using where
2   DEPENDENT SUBQUERY  mm4             ref   uid_local,uid_foreign   uid_local   4         mm4.uid   6       Using where

If I change the product_table.partner_id = 0 to something that returns less rows e. g. a few hundred or so the query is very fast (0.015 sec).

So the problem is that the COUNT(*) query is very fast if it needs to count only a few rows (100 - 200) but very slow on counting a larger result set (> 10.000 rows).

Any ideas on how to fix this?


A few facts:

  • the mm tables have indexes on uid_local and uid_foreign
  • The product_table has a indexed PK and there is another index on partner_id
  • The product_table table contains around 30.000 rows


EDIT

The IN clauses are not fixed, the query could instead of IN (7, 8, 9) look like IN (7, 9) or other combination of numbers.

A: 

You could use materialized views for the EXISTS parts of your query. This way, you could fold the IN statements into a single query. Then you just need to create an index on uid and the query should take almost no time anymore:

SELECT
COUNT(*)
FROM
product_table
WHERE
(product_table.partner_id = 0 OR product_table.partner_id = 15) AND
EXISTS(SELECT NULL FROM vmm1 WHERE vmm1.uid_local = product_table.uid) AND
EXISTS(SELECT NULL FROM vmm2 WHERE vmm2.uid_local = product_table.uid) AND
EXISTS(SELECT NULL FROM vmm3 WHERE vmm3.uid_local = product_table.uid) AND
EXISTS(SELECT NULL FROM vmm4 WHERE vmm4.uid_local = product_table.uid)

The drawback: If you change the mm-tables a lot, the view will need to be updated and that will make the changes to those tables slower.

Aaron Digulla
What are materialized views? Is the query then still able to take parameters? The user selects the values for the IN clause via a web frontend.
Max
From your example, I gathered that the IN clauses are fix. If they can change, then this doesn't work.
Aaron Digulla
A materialized views is a view that gets copied into a dynamic table. The contents of that table will change when you change rows in the underlying tables.
Aaron Digulla
I clarified the example, but thank you for your suggestion and the explanation, I never heard about materialized views before. Would have been nice to use them, but it does not work in this case.
Max
A: 

take a look at FORCE INDEX in the mysql manual. if you have 10k rows in your M2M table that match, the optimizer might have decided that scanning the table is better than consulting the index, but in your case, probably not.

longneck
But EXPLAIN output indicates that all indexes are used, or did I read it wrong?
Max