views:

2027

answers:

5

I have a somewhat complex query with roughly 100K rows.

The query runs in 13 seconds in SQL Server Express (run on my dev box)

The same query with the same indexing and tables takes over 15+ minutes to run on MySQL 5.1 (run on my production box - much more powerful and tested with 100% resources) And sometimes the query crashes the machine with an out of memory error.

What am I doing wrong in MySQL? Why does it take so long?

select e8.*
from table_a e8
inner join (
    select max(e6.id) as id, e6.category, e6.entity, e6.service_date
    from (
     select e4.* 
     from table_a e4
     inner join (
      select max(e2.id) as id, e3.rank, e2.entity, e2.provider_id, e2.service_date
      from table_a e2
      inner join (
       select min(e1.rank) as rank, e1.entity, e1.provider_id, e1.service_date
       from table_a e1
       where e1.site_id is not null
       group by e1.entity, e1.provider_id, e1.service_date 
      ) as e3
      on e2.rank= e3.rank
      and e2.entity = e3.entity
      and e2.provider_id = e3.provider_id
      and e2.service_date = e3.service_date
      and e2.rank= e3.rank
      group by e2.entity, e2.provider_id, e2.service_date, e3.rank
     ) e5
     on e4.id = e5.id
     and e4.rank= e5.rank       
    ) e6
    group by e6.category, e6.entity, e6.service_date 
) e7
on e8.id = e7.id and e7.category = e8.category
+1  A: 

100,000 rows shouldn't take 13 seconds if efficient indexes were available. I suspect the difference is due to the fact that SQL server has a much more robust query optimizer than MySQL. What MySQL has is more on the order of an SQL Parser than an Optimizer.

You'll need to provide a lot more information - full schemas of all participating tables, and full list of indexes on each, for starters.

Then some idea of what the data is about, and what the query is intended to produce. Something on the order of a Use Case.

le dorfier
isn't a sql statement a math problem?it's possible to write an equivalent statement without any other knowledge
mson
But it's not possible to evaluate it's effectiveness. Proper SQL development doesn't include making a bunch of guesses and trying them out. It requires knowledge of the context. Equivalent statements don't evoke identical query plans. Especially so between two different products.
le dorfier
+1  A: 

It'd be interesting to EXPLAIN PLAN with both to see what the differences were. I'm not sure if it's an apple and orange comparison, but I'd be curious.

I don't know if this can help, but this was the first hit on a search for "mysql query optimizer".

duffymo
A: 

Here's another one that might be worthwhile.

duffymo
+2  A: 

This answer I originally attempted to post to your deleted question which did not indicate that it was a problem with MySQL. I would still go ahead and use SQL Server to refactor the query using the CTEs and then convert back to nested queries (if any remain). Sorry about the formatting, Jeff Atwood sent me the original posted text and I had to reformat it again.

It's hard to do without data, expected results and good names, but I would convert all the nested queries into CTEs, stack them up, name them meaningfully and refactor - starting with excluding the columns which you aren't using. Removing the columns is not going to result in the improvement, because the optimizer is pretty smart - but it WILL give you the ability to improve your query - probably factoring out some or all of the CTEs. I'm not sure what your code is doing, but you may find the new RANK()-type functions useful, because it appears you are using a seek-back type of pattern with all these self-joins.

So start from here instead. I've looked at e7 improvements for you, the columns unused from e7 may indicate either a defect or incomplete thinking about the grouping possibilities, but if those columns are truly unnecessary, then this may trickle all the way back through your logic in e6, e5 and e3. If the grouping in e7 is correct then you can eliminate everything but max(id) in the results and the join. I cannot see why you would have multiple MAX(id) per category, because this would multiply your results when you join, so the MAX(id) must be unique within the category, in which case the category is redundant in the join.

WITH e3 AS (
select min(e1.rank) as rank,
e1.entity,
e1.provider_id,
e1.service_date
from table_a e1
where e1.site_id is not null
group by e1.entity, e1.provider_id, e1.service_date
)

,e5 AS (
select max(e2.id) as id,
e3.rank,
e2.entity,
e2.provider_id,
e2.service_date
from table_a e2
inner join e3
on e2.rank= e3.rank
and e2.entity = e3.entity
and e2.provider_id = e3.provider_id
and e2.service_date = e3.service_date
and e2.rank= e3.rank
group by e2.entity, e2.provider_id, e2.service_date, e3.rank
)

,e6 AS (
select e4.* -- switch from * to only the columns you are actually using
from table_a e4
inner join e5
on e4.id = e5.id
and e4.rank= e5.rank
)

,e7 AS (
select max(e6.id) as id, e6.category -- unused, e6.entity, e6.service_date
from e6
group by e6.category, e6.entity, e6.service_date
-- This instead
-- select max(e6.id) as id
-- from e6
-- group by e6.category, e6.entity, e6.service_date
)

select e8.*
from table_a e8
inner join e7
on e8.id = e7.id
and e7.category = e8.category
-- THIS INSTEAD on e8.id = e7.id
Cade Roux
Thanks Cade - this was just the sort of thing I was looking for!However, this will make SQL Server even faster... MySQL does not have CTEs. I was trying to figure out if I had done something wrong and it wasn't the fault of MySQL
mson
Yes, but if you simplify SQL Server through refactoring your nested queries, the result will probably also be better on MySQL.
Cade Roux
I had an even more complex query in SQL Server - using CTEs has taken the query form 5 minutes to 7 seconds.I'm pretty convinced now that MySQL isn't for me or my domain. MySQL can handle large datasets well, but it cannot handle complex or large queries very well.
mson
I had read about CTEs before, but I had forgotten about them - thanks so much!
mson
If your nested queries are truly identical, I'm not sure that the CTE will actually improve performance that much - because I'm pretty sure the optimizer is smart enough to treat identical subqueries the same. It does make it a lot easier to read and to refactor.
Cade Roux
I had to run a report that ran a bunch of unions based on counts based on likes... The CTEs reduced the processing time by more than 2 orders of magnititude. I haven't tried them out on the original query..., but i definitely benefited from them in another query.
mson
I am finding UNIONs extremely slow in 2005 (moreso than 2000). The CTEs may end up being materialized (which is how I manually speed up slow UNIONs).
Cade Roux
A: 

The only open source database that I know who have CTEs is Firebird (http://www.firebirdsql.org/rlsnotesh/rlsnotes210.html#rnfb210-cte)

Postgres will have in 8.4 I think

Hugues Van Landeghem