ansaurus

Question

MySQL query optimization - distinct, order by and limit

Answer 1

+2 A:

The reason that the query without the distinct runs faster is because you have a limit clause. Without the distinct, the server only needs to look at the first hundred matches. However however some of those rows may have duplicate fields, so if you introduce the distinct clause, the server has to look at many more rows in order to find ones that do not have duplicate values.

BTW, why are you using OUTER JOIN?

e4c5 2010-05-27 04:15:38

@Manuel Darveau: Indeed, why use OUTER JOINs. Have you tried/validated the results using 'normal' joins. Should be much faster!

lexu 2010-05-27 05:20:36

Answer 2

+1 A:

Hi,

Here for "rentalsegm2_" table, optimizer has chosen "index_endDate" index and its no of rows expected from this table is about 4.5 lakhs. Since there are other where conditions exist, you can check for "this_" table indexes . I mean you can check in "this_ table" for how much records affected for each where conditions.

In summary, you can try for alternate solutions by changing indices used by optimizer. This can be obtained by "USE INDEX", "FORCE INDEX" commands.

Thanks

Rinson KE DBA www.qburst.com

RINSON KE 2010-05-27 07:19:09

Answer 3

+1 A:

From the execution plan we see that the optimizer is smart enough to understand that you do not require OUTER JOINs here. Anyway, you should better specify that explicitly.
The DISTINCT modifier means that you want to GROUP BY all fields in SELECT part, that is ORDER BY all of the specified fields and then discard duplicates. In other words, order by rentalsegm2_.id asc clause does not make any sence here.

The query below should return the equivalent result:

select distinct this_.id as y0_
from Rental this_
    join RentalRequest rentalrequ1_ 
      on this_.id=rentalrequ1_.rental_id
    join RentalSegment rentalsegm2_ 
      on rentalrequ1_.id=rentalsegm2_.rentalRequest_id
where
    this_.DTYPE='B'
    and this_.id<=1848978
    and this_.billingStatus=1
    and rentalsegm2_.endDate between 1273631699529 and 1274927699529
limit 0, 100;

UPD

If you want the execution plan to start with RentalSegment, you will need to add the following indices to the database:

RentalSegment (endDate)
RentalRequest (id, rental_id)
Rental (id, DTYPE, billingStatus) or (id, billingStatus, DTYPE)

The query then could be rewritten as the following:

SELECT this_.id as y0_
FROM RentalSegment rs
    JOIN RentalRequest rr
    JOIN Rental this_
WHERE rs.endDate between 1273631699529 and 1274927699529
    AND rs.rentalRequest_id = rr.id
    AND rr.rental_id <= 1848978
    AND rr.rental_id = this_.id
    AND this_.DTYPE='D'
    AND this_.billingStatus = 1
GROUP BY this_.id
LIMIT 0, 100;

If the execution plan will not start from RentalSegment you can force in with STRAIGHT_JOIN.

newtover 2010-05-27 08:59:32

I don't understand why the order by does not make sense. Since this is a paginated criteria (I will increase the first param in limit on every call), I must specify an oder by else results will not always be returned in the same order and the pagination will not work. Am I missing something?

Manuel Darveau 2010-05-27 12:37:29

@Manuel Darveau: the field `rentalsegm2_.id` is not mentioned in the SELECT part and can potentially be in 1-to-many relation to this._id. When you GROUP BY `this._id`, sorting by `rentalsegm2_.id` does not make sense anymore, since the items by which the sorting is performed can be taken arbitrarily. GROUPing operation (and DISTINCT) assumes implicit sorting by grouped columns (you can even say `GROUP BY this_id DESC` in MySQL) and the results go in that sorted order.

newtover 2010-05-27 12:46:54

@Manuel Darveau: I meant that ordering by `rentalsegm2_.id` does not make sence, but explicit ordering by `this_.id asc` does (it is implicit in this case).

newtover 2010-05-27 12:53:40

@newtover: I used to order by this_.id but the query took 5 times longer. The problem is that mysql first select on the Rental table, then on RentalRequest and on RentalSegment. The explain plan is the opposite of what I originally posted. I tested your query and took 12 sec (compared to my original query which took 3 sec.)My biggest restriction is on RentalSegment so I guess MySQL should start the query on it.

Manuel Darveau 2010-05-27 14:32:15

@Manuel Darveau: I updated my answer.

newtover 2010-05-27 15:09:54

ansaurus

tags:

views:

answers:

MySQL query optimization - distinct, order by and limit

related questions