views:

99

answers:

3

I am doing a query that converts rows to columns similar to this post but have encountered a performance problem. Here is the query:-

SELECT
    Info.Customer,
    Answers.Answer,
    Answers.AnswerDescription,
    Details.Code1,
    Details.Code2,
    Details.Code3
FROM
    Info
    LEFT OUTER JOIN Answers
    ON Info.AnswerID = Answers.AnswerID
    LEFT OUTER JOIN
    (SELECT
      ReferenceNo,
      MAX(CASE DetailsIndicator WHEN 'cde1' THEN DetailsCode ELSE NULL END ) Code1,
      MAX(CASE DetailsIndicator WHEN 'cde2' THEN DetailsCode ELSE NULL END ) Code2,
      MAX(CASE DetailsIndicator WHEN 'cde3' THEN DetailsCode ELSE NULL END ) Code3
    FROM DetailsData
    GROUP BY ReferenceNo) Details
    ON Info.ReferenceNo = Details.ReferenceNo
    WHERE Info.Date BETWEEN x AND y

There are less than 300 rows returned, but the Details table is about 180 thousand rows. The query takes 45 seconds to run in half the time. Note the inner query takes 7 seconds to run.

When I type show processlist; into MYSQL it is hanging on "Sending Data".

Any thoughts as to what the performance problem might be?

+1  A: 

Well as a start I would move the where info.date and join of Details and Info inside the subquery.

Also, are the Details.ReferenceNo and Info.Date indexed?

EDIT: Well, here's the version that I had in mind

SELECT
    Details.Customer,
    Answers.Answer,
    Answers.AnswerDescription,
    Details.Code1,
    Details.Code2,
    Details.Code3
FROM
    (SELECT
      Info.Customer,
      Info.AnswerID,
      Info.ReferenceNo,
      MAX(CASE DetailsIndicator WHEN 'cde1' THEN DetailsCode ELSE NULL END ) Code1,
      MAX(CASE DetailsIndicator WHEN 'cde2' THEN DetailsCode ELSE NULL END ) Code2,
      MAX(CASE DetailsIndicator WHEN 'cde3' THEN DetailsCode ELSE NULL END ) Code3
    FROM Info LEFT JOIN DetailsData ON Info.ReferenceNo = Details.ReferenceNo
    WHERE Info.Date BETWEEN x AND y
    GROUP BY ReferenceNo) Details
    LEFT OUTER JOIN Answers ON Details.AnswerID = Answers.AnswerID

So the first iteration is to reduce the number of records generated in the subquery (reason: the source of it has several hundred records and there is aggregation on it, so mysql has to build a resultset for further joining) - in that sense DRapp's solution and this one are similar.

However, the question is if it is really necessary to have a subquery at all? It would be intresting to see how would the following perform

SELECT
    Info.Customer,
    Answers.Answer,
    Answers.AnswerDescription,
    MAX(CASE DetailsIndicator WHEN 'cde1' THEN DetailsCode ELSE NULL END ) Code1,
    MAX(CASE DetailsIndicator WHEN 'cde2' THEN DetailsCode ELSE NULL END ) Code2,
    MAX(CASE DetailsIndicator WHEN 'cde3' THEN DetailsCode ELSE NULL END ) Code3
FROM
    FROM Info 
    LEFT JOIN DetailsData ON Info.ReferenceNo = DetailsData.ReferenceNo
    LEFT OUTER JOIN Answers ON Details.AnswerID = Answers.AnswerID
WHERE Info.Date BETWEEN x AND y
GROUP BY ReferenceNo

Given the fact that there are only hundreds records in the result and if there are indexes covering joins and select condition I would expect the above to return results in less then a second on a modest hardware.

(queries untested)

Unreason
yes, although your comment was to include the date in the subquery, it was still needed in the outer as well. You would have restricted the internal range, but left the outer to include all dates... But after running, I'm sure you would have noticed that too.
DRapp
Actually, the above is what I had in mind; also while explaining I realized that subquery can (should) go.
Unreason
I too was considering your second version, but the group by has to be inclusive of all non-aggregate fields, so he would have to either MAX( Answer.Answer ) Answer, same on description, or add to the group by.
DRapp
@DRapp, no it does not, this is mysql :) (before so I thought exactly what you said, too; even left the same comment not so long ago), see http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html
Unreason
A: 

Try use "EXPLAIN", to define which query is slow

smg
+2  A: 

First, the inner subselect that queries the "Details" result is querying against ALL entries... is that what you want? I don't think so. You only appear to want results based on a reference number that qualifies from the outer date check.

So, I would change your inner select to include...

FROM DetailsData 
where DetailsData.ReferenceNo IN 
    ( select distinct Info2.ReferenceNo
        from Info Info2 where Info2.Date between x and y )
GROUP BY ReferenceNo) Details 

This way, you are only getting details associated with reference numbers within the date range in question.

DRapp
I'll give this a go and get back to you thanks :-)
Tarski
I'll give +1 cause I think it is useful, but I've said the same :)
Unreason
Nice one, it worked in under 5 seconds the first time I ran it ;-)
Tarski
@Tarski, could you check the timing on my last query version, too? I am really interested how it will fare and don't have data of the same size/structure around.
Unreason