tags:

views:

162

answers:

2

I thought that I'll be clever and use subquery to get my report in one go. But after running into problems and reading documentation I saw that my approach does not work in MySQL. My inner query returns ~100 records and outer query scans 20000 records. When I restricted outer query to 20 records then it run 20 sec - really slow.

I wonder is it possible to restructure it somehow so that inner query wouldn't be run EVERY time for every record in the outer query?

select p1.surname ,p1.name,p1.id,r1.start_date,r1.end_date,c1.short_name
FROM ejl_players p1
left JOIN ejl_registration r1 ON ( r1.player_id = p1.id )
left JOIN ejl_teams t1 ON ( r1.team_id = t1.id )
left JOIN ejl_clubs c1 ON ( t1.club_id = c1.id )
where  r1.season=2008
and p1.id in
 (
SELECT p.id
FROM ejl_players p 
left JOIN ejl_registration r ON (r.player_id = p.id) 
left JOIN ejl_teams t ON (r.team_id = t.id) 
left JOIN ejl_clubs c ON (t.club_id = c.id)
WHERE r.season = 2008
GROUP BY p.id
HAVING COUNT(DISTINCT c.id)  > 1
)

Explain (I restricted outer query to maximum 20 records:

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra  
1 PRIMARY p1 range PRIMARY PRIMARY 4 NULL 19 Using where 
1 PRIMARY r1 ref team_id,season season 10 const,d17528sd14898.p1.id 1 Using where 
1 PRIMARY t1 eq_ref PRIMARY PRIMARY 4 d17528sd14898.r1.team_id 1   
1 PRIMARY c1 eq_ref PRIMARY PRIMARY 4 d17528sd14898.t1.club_id 1   
2 DEPENDENT SUBQUERY p index PRIMARY PRIMARY 5 NULL 23395 Using index 
2 DEPENDENT SUBQUERY r ref team_id,season season 10 const,d17528sd14898.p.id 1 Using where; Using index 
2 DEPENDENT SUBQUERY t eq_ref PRIMARY PRIMARY 4 d17528sd14898.r.team_id 1   
2 DEPENDENT SUBQUERY c eq_ref PRIMARY PRIMARY 4 d17528sd14898.t.club_id 1 Using index
+5  A: 

Try using an INNER JOIN (something like this):

SELECT p1.surname ,p1.name,p1.id,r1.start_date,r1.end_date,c1.short_name
FROM ejl_players p1
INNER JOIN (
    SELECT p.id
    FROM ejl_players p 
    LEFT JOIN ejl_registration r ON (r.player_id = p.id) 
    LEFT JOIN ejl_teams t ON (r.team_id = t.id) 
    LEFT JOIN ejl_clubs c ON (t.club_id = c.id)
    WHERE r.season = 2008
    GROUP BY p.id
    HAVING COUNT(DISTINCT c.id)  > 1
) p2 ON p1.id = p2.id
LEFT JOIN ejl_registration r1 ON ( r1.player_id = p1.id )
LEFT JOIN ejl_teams t1 ON ( r1.team_id = t1.id )
LEFT JOIN ejl_clubs c1 ON ( t1.club_id = c1.id )
WHERE  r1.season=2008

Using the subquery in this manner should be more efficient but isn't always. However, it does bypass the issue of having the subquery executed for every record returned in the main query. Instead the subquery is constructed as a virtual table in memory and then used for comparison with the main query.

Edit: I should point out that you'll want to use EXPLAIN in MySQL to verify that this query is indeed performing more efficiently.

Noah Goodrich
IN is a bad construct to use in almost all circumstances the above is much better.
PeteT
Thanks! This one works real fast.
Riho
+1  A: 

Like I commented on your question the other day, you don't need to use a LEFT JOIN in this example. Outer joins often perform slower than inner joins, so you can get some better performance by using a simple inner join.

You would need to use an outer join only if you need to show all players, even those who don't have any registration.

It seems that your query is looking for players who have been on teams in more than one club this year (like your earlier question), and then outputting some details of their registration and club name. Here's how I would solve this query:

SELECT p.surname, p.name, p.id, r.start_date, r.end_date, c1.short_name
FROM ejl_players p
 INNER JOIN ejl_registration r1 ON (r.player_id = p.id)
 INNER JOIN ejl_teams t1 ON (r.team_id = t1.id)
 INNER JOIN ejl_clubs c1 ON (t1.club_id = c1.id)
 INNER JOIN ejl_teams t2 ON (r.team_id = t2.id)
 INNER JOIN ejl_clubs c2 ON (t2.club_id = c2.id)
WHERE r.season = 2008
GROUP BY r.player_id, r.team_id
HAVING COUNT(DISTINCT c2.id) > 1;

This works in MySQL because MySQL is permissive about the Single-Value Rule. That is, the columns in your GROUP BY clause don't have to be the same as the non-aggregated columns named in your select-list. In other brands of RDBMS, this query would generate an error.

Bill Karwin
Bill, I didn't look at the original question you reference. Is it possible for any of these inner joins to return more than a single row? If so, then you could get confusing results on the r.start_date, r.end_date, or c1.short_name
Noah Goodrich
I think the point is that they *do* return multiple rows. That is, a person can have multiple registrations. It's not a problem.
Bill Karwin
Though that's only for the person <-- registration relationship. The registration --> team and team --> club relationships are bound to only return a single row.
Bill Karwin
I use left joins to catch the possible NULL values (some teams have not club attached to them)
Riho
BTW, your query returns 0 records.
Riho