tags:

views:

80

answers:

2

I have a mysql database that is tracking hockey stats. What I'd like to do is in one query get the number of goals and assists scored by each player as well as the number of games that they've played in. I'm using Zend Framework and the query that I've build is this:

SELECT `p`.*, 
       `pxt`.`jersey_number`, 
       count(pxg.player_x_game_id) AS `games`, 
       count(goals.scoring_id) AS `goals`, 
       count(assists.scoring_id) AS `assists` 
FROM  `players` AS `p` 
INNER JOIN `players_x_teams` AS `pxt` ON p.player_id = pxt.player_id 
INNER JOIN `teams_x_seasons` AS `txs` ON pxt.team_id = txs.team_id 
INNER JOIN `seasons` AS `s` ON txs.season_id = s.season_id 
INNER JOIN `games` AS `g` ON g.season_id = s.season_id
INNER JOIN `players_x_games` AS `pxg` ON pxg.game_id = g.game_id 
                                     AND pxg.player_id = p.player_id 
LEFT JOIN `scoring` AS `goals` ON goals.game_id = g.game_id 
                              AND goals.scorer_id = p.player_id 
LEFT JOIN `scoring` AS `assists` ON assists.game_id = g.game_id 
                                AND (assists.assist1_id = p.player_id OR assists.assist2_id = p.player_id) 
WHERE (pxt.team_id = 1) 
  AND (txs.season_id = '23') 
  AND (pxt.date_added <= s.end_date OR pxt.date_added is null) 
  AND (pxt.date_removed >= s.start_date OR pxt.date_removed is null) 
GROUP BY `p`.`player_id`

This query returns me data, but my counts are off.

+-----------+---------------+-------+-------+---------+
| player_id | jersey_number | games | goals | assists |
+-----------+---------------+-------+-------+---------+
|         2 | 3             |     7 |     1 |       3 | 
|         3 | 19            |     6 |     1 |       0 | 
|         8 | 8             |     7 |     3 |       2 | 
|         9 | 11            |    13 |    10 |       8 | 
|        11 | 96            |     6 |     1 |       3 | 
|        12 | 14            |     6 |     0 |       3 | 
|        13 | 7             |     6 |     0 |       1 | 
|       115 | 39            |     9 |     6 |       2 | 
|       142 | 68            |     6 |     0 |       1 | 
|       143 | 30            |     6 |     0 |       0 | 
|       150 | 41            |    11 |    11 |       5 | 
|       185 | 17            |     6 |     6 |       3 | 
|       225 | 97            |     4 |     1 |       3 | 
+-----------+---------------+-------+-------+---------+

In this dataset the most games that should be present are 6, but as you can see I'm getting extras. If I adjust my query to remove the goals and assists fields my games count comes out correct. In fact if I only select one of my counted rows I always get the correct counts, but once I add a second or third count my numbers start to get skewed. What am I doing wrong?

+3  A: 

Maybe you need count(DISTINCT pxg.player_x_game_id)...? Looks like there might be duplicates in that humungous megajoin (which I admit I haven't actually taken time to fully reproduce!-)...

Alex Martelli
Distinct it is. I thought I'd prevented duplicates with my join logic, but I guess I missed something. Thanks.
Rob Booth
@Rob, you're welcome!
Alex Martelli
+2  A: 

Since you are doing multiple joins which may each match multiple rows and carry over to the next join, you'll need to add distinct in your count. Try this:

SELECT `p`.*, 
       `pxt`.`jersey_number`, 
       count(distinct pxg.player_x_game_id) AS `games`, 
       count(distinct goals.scoring_id) AS `goals`, 
       count(distinct assists.scoring_id) AS `assists` 
FROM  `players` AS `p` 
INNER JOIN `players_x_teams` AS `pxt` ON p.player_id = pxt.player_id 
INNER JOIN `teams_x_seasons` AS `txs` ON pxt.team_id = txs.team_id 
INNER JOIN `seasons` AS `s` ON txs.season_id = s.season_id 
INNER JOIN `games` AS `g` ON g.season_id = s.season_id
INNER JOIN `players_x_games` AS `pxg` ON pxg.game_id = g.game_id 
                                     AND pxg.player_id = p.player_id 
LEFT JOIN `scoring` AS `goals` ON goals.game_id = g.game_id 
                              AND goals.scorer_id = p.player_id 
LEFT JOIN `scoring` AS `assists` ON assists.game_id = g.game_id 
                                AND (assists.assist1_id = p.player_id OR assists.assist2_id = p.player_id) 
WHERE (pxt.team_id = 1) 
  AND (txs.season_id = '23') 
  AND (pxt.date_added <= s.end_date OR pxt.date_added is null) 
  AND (pxt.date_removed >= s.start_date OR pxt.date_removed is null) 
GROUP BY `p`.`player_id`
Asaph