views:

114

answers:

3

I am trying here to basically find users that do have sports & regions targeted by an activity. In the acces [users] table there is around 17K users. Each can have a certain number of sport interests and one region.

There query here look for each users that have one sport & one region at least that are targeted via the activities. Sports can be up to 75 when we select each of em [not quite good with an IN query].

SELECT a.user, pp.courriel
FROM acces a
LEFT JOIN acces_profil_sport ap ON ap.id = a.id
LEFT JOIN profil_perso pp ON pp.id = a.id
WHERE ap.sport_id IN
  (
    SELECT ac.sport_id
    FROM activite_sport ac
    RIGHT JOIN activite a ON a.activite_id = ac.activite_id AND a.is_cron = 1 AND a.cron_processed = 0
   )
  AND pp.region_id IN
  (
    SELECT ar.region_id
    FROM activite_region ar
    RIGHT JOIN activite a ON a.activite_id = ar.activite_id AND a.is_cron = 1 AND a.cron_processed = 0
  )
GROUP BY a.id

If I remove the sport lookup, the query takes arounds 30 secs to run. Otherwise it takes quite forever and use around 99% of the proc with mysql.

Any hints to help that ?

[edit : Table structure]
Acces : id (primary key), user, perso_id (Key/Foreign key to profil_perso[perso_id]) [some-other-fields]
profil_perso : perso_id (primary key) courriel, region_id, id (foreign key to acces[id]) [some other fields]
acces_profil_sport : id/sport_id (dual primary key), niveau_id (dual key with sport_id)

+4  A: 

I suspect your indexes are wrong. If you print out an explain select..., I can better comment on that. Further, I'm curious why you're doing left / right joins and subselects.

It seems to me that these should all be normal joins since the two left joins will only work if they exist. If they come up null, you won't get a row because of the required subselect matching.

As for the right joins, you need the ar bit there, which isn't part of the right side. I'd either remove them or make them straight joins as well. I assume since you're checking for what looks like unprocessed cron work, you want to keep them.

SELECT a.user, pp.courriel
FROM acces 
JOIN acces_profil_sport ap ON ap.id = a.id
JOIN profil_perso pp ON pp.id = a.id
JOIN activite_sport ac ON ac.sport_id = ap.sport_id
JOIN activite a1 ON a.activite_id = ac.activite_id AND a.is_cron = 1 AND a.cron_processed = 0
JOIN activite_region ar ON ar.region_id = pp.region_id
JOIN activite a2 ON a.activite_id = ar.activite_id AND a.is_cron = 1 AND a.cron_processed = 0
Autocracy
It's going to be processed by a cron nightly ... kind of a mailinglist-ish
Erick
Nice one ! only 0.4s ! it's a keeper thanks!
Erick
Neato, pretty much what I was going to suggest, nice solution!
Mark Canlas
Do you need a distinct on that as well? e.g. for multiple rows in acces_profile_sport for the same 'id' value... a user attached to multiple sports? Multiple activities for a given region?
araqnid
I added a GROUP BY statement, work like a charm. [Tho I'd be curious to check what's the quickest between distinct and group by
Erick
Well they both do different things. One is used to form aggregations, the other is used to eliminate duplicates. And, the answer is always measure, measure, measure.
Mark Canlas
A: 

Do you have indices on is_cron and cron_processed? It could help speed things up.

James Socol
A: 
SELECT acces.user, courriel
FROM acces
JOIN profil_perso ON acces.id = profil_perso.id
WHERE EXISTS (SELECT 1 FROM acces_profil_sport JOIN activite_sport on acces_profil_sport.sport_id = activite_sport.sport_id JOIN activite ON activite.activite_id = activite_sport.activite_id WHERE is_cron = 1 AND cron_processed = 0 AND acces_profil_sport.id = profil_perso.id)
AND EXISTS (SELECT 1 FROM activite_region JOIN activite ON activite_region.activite_id = activite.activite_id WHERE is_cron = 1 AND cron_processed = 0 AND activite_region.region_id = profil_perso.region_id);
araqnid
Why use a subselect where a join solves it? Subselects tend to create more overhead, and in some cases, run appreciably slower.
Autocracy
As I mentioned in a comment above, I think there might be a 'distinct' required to just use joins. Using a "where exists" to join tables works well in most DBs, although I admit I don't use MySQL much and wasn't aware it's something to avoid there.
araqnid