tags:

views:

1419

answers:

9

I'm looking for a better way to do the following query. I have a table that looks like this:

game_id | home_team_id | away_team_id
1       | 100          | 200
2       | 200          | 300
3       | 200          | 400
4       | 300          | 100
5       | 100          | 400

And I want to write a query that counts the number of home games and away games for each team and outputs the following:

team_id | home_games | away_games
100     | 2          | 1
200     | 2          | 1
300     | 1          | 1
400     | 0          | 2

Right now, I wrote this monstrosity that works, but it's slow (I know it's pulling the entire 2,800 row from the table twice).

SELECT 
  home_team_id as team_id,
  (SELECT count(*) FROM `game` WHERE home_team_id = temp_game.home_team_id) as home_games,
  (SELECT count(*) FROM `game` WHERE home_team_id = temp_game.away_team_id) as away_games
  FROM (SELECT * FROM `game`) as temp_game
  GROUP BY home_team_id

Can a SQL guru help me knock out a better way? I think my problem is that I don't understand how to get a distinct list of the team IDs to throw at the count queries. I bet there's a better way with a better placed, nested SELECT. Thanks in advance!

+1  A: 

Maybe this could help you:

http://stackoverflow.com/questions/248990/summarize-aggregated-data#249272

See my answer using Pivot() in the same thread.

MarlonRibunal
+2  A: 

If you want the distinct list of teams, you have to select from the game table twice, unioning the home and the away teams (theoretically, one team could play all its games on the road or at home, if you have logic that prevents that, then you could adjust this query):

select home_team_id as team_id from game union
select away_team_id as team_id from game

The union operator will make sure you only get distinct elements in the return set (unless you use union all)

From there, you can use left outer joins to aggregate your data:

select
    u.team_id, count(h.game_id) as home_games, count(a.game_id) as away_games
from
    (
     select home_team_id as team_id from game union
     select away_team_id as team_id from game
    ) as u
     left outer join game as h on h.home_team_id = u.team_id
     left outer join game as a on a.away_team_id = u.team_id
group by
    u.team_id

If you want to reduce your table scans even further (the above will produce four), you can add more code, but it will cost you. You can get a list of rows with the team_id, and whether or not the game was played at home or away:

select
    case ha.home when 0 then g.away_team_id else g.home_team_id end as team_id,
    case ha.home when 0 then 0 else 1 end as home_games,
    case ha.home when 0 then 1 else 0 end as away_games
from
    game as g, (select 0 as home union select 1 as home) as ha

From there, you can simply sum up the games at home and away for each team:

select
    t.team_id, sum(t.home_games) as home_games, sum(t.away_games) as away_games
from
    (
     select
      case ha.home when 0 then g.away_team_id else g.home_team_id end as team_id,
      case ha.home when 0 then 0 else 1 end as home_games,
      case ha.home when 0 then 1 else 0 end as away_games
     from
      game as g, (select 0 as home union select 1 as home) as ha
    ) as t
group by
    t.team_id

This will result in a single table scan.

casperOne
+1  A: 

Greg,

I think your ultimate solution will be language-specific. But if you were doing this in Oracle, you could query the table only once with the following:

SELECT game.home_team_id AS team_id,
       SUM(CASE WHEN game.home_team_id = game.away_team_id
                THEN 1
                ELSE 0 END) AS home_games,
       SUM(CASE WHEN game.home_team_id <> game.away_team_id
                THEN 1
                ELSE 0 END) AS away_games
  FROM game
GROUP BY game.home_team_id
ORDER BY game.home_team_id;

You don't say what flavor of SQL you're using so this is the best I can do.

Best of luck,

Stew

p.s. It looks like I've given the same solution as MarlonRibunal. I just didn't have a handy link so had to create the code by hand. :-/

Stew S
I'm using MySQL 5.0. I was hoping it could be done without leaning on vendor-specific details...but oh well!
Greg
Greg,I tried to write this as vendor-neutral as I could. I know CASE is supported in multiple variants, so I'd say give it a try and see what happens.
Stew S
I gave it a try ... it's fast but the results were off a bit. I see an accurate listing of the home games (I know what the totals 'should be'), but they are under the label of 'away games'. Also, the 'home games' column is all zeros, when I know it should have values. Any idea on what to tweak?
Greg
This doesn't work because not all teams have had home games.
Nathan Feger
Are you sure you wrote it the way I did, which looks right to me.Care to provide a larger dataset in the the form of INSERT statements? The set you provided didn't include any home games so I couldn't check.
Stew S
No, there's definitely home and away games for every team in my data set (2008 MLB Season). I wonder if it's a vendor difference? Thanks for the help, Stewstools, but I went with the 2nd table concept given by Frank Flynn.
Greg
Every game is a 'home' game for one team ... there's a home team and an away team. I guess I'm not following your question?
Greg
Sorry, my mistake in the query. See correction below.
Stew S
A: 

Try this:

Select Z.teamId, 
    Count(H.Game_Id) HomeGames, 
    Count(A.Game_Id) AwayGames
From (Select Distinct home_team_id TeamId From Game
        Union 
      Select Distinct away_team_id TeamId From Game) Z
   Left Join Game H On H.home_team_id = Z.TeamId
   Left Join Game A On A.away_team_id = Z.TeamId
Group By Z.TeamId
Charles Bretana
Union is not bad, but using the teams table should solve the same problem for less cost.
Nathan Feger
what team table? I didn't see that he had one in his schema... a team table would be much better.
Charles Bretana
A: 
declare @ts table

(
    team_id int
)

declare @t table
(
    id int,
    h int,
    a int
)

insert into @ts values (100)
insert into @ts values (200)
insert into @ts values (300)
insert into @ts values (400)

insert into @t values (1, 100, 200)
insert into @t values (2, 200, 300)
insert into @t values (3, 200, 400)
insert into @t values (4, 300, 100)
insert into @t values (5, 100, 400)

select s.team_id, t0.home, t1.away
from @ts s
    left outer join (select team_id, count(h) as [home] from @ts inner join @t on h = team_id group by team_id) t0 on t0.team_id = s.team_id
    left outer join (select team_id, count(a) as away from @ts inner join @t on a = team_id group by team_id) t1 on t1.team_id = s.team_id
Austin Salonen
+4  A: 

It's cleaner if you have another table team with team_id and team_name.

SELECT team_id, team_name, 
     sum(team_id = home_team_id) as home_games, 
     sum(team_id = away_team_id) as away_games
 FROM game, team
 GROUP BY team_id

What's going on: the no WHERE clause causes a Cartesian Product between the two tables; we group by team_id to get back to one row per team. Now there are all the rows from the game table for each team_id so you need to count them but the SQL count function isn't quite right (it would count all the rows or all the distinct rows). So we say team_id = home_team_id which resolves to 1 or 0 and we use sum to add up the 1's.

The team_name is just because it's geeky to say that 'team 200 had 20 home games' when we ought to say that 'Mud City Stranglers had 20 home games'.

PS. this will work even if there are no games (often a problem in SQL where there is a team with 0 games and that row will not show up because the join fails).

Frank Flynn
Wow, I must have a serious brain cramp because that makes perfect sense. I do actually have a team table...but I wanted to exercise my brain a bit (and then got stuck and came to SO!). :)
Greg
I bow down to your awesomeness!
Nathan Feger
Not really a good answer, because it assumes artifacts that were not known or presented. I'm thrilled that it worked for the original poster, but it really isn't apt. It's just coincidental that it worked for the original poster.
casperOne
A: 

Here is another example. I would point out though that you should start your from clause from the teams table, so that you'll be sure to include all the teams, even if they haven't played a game yet.

This query does your two queries as joins instead of subselects, which should perform better.

-- note: coalesce is like ifnull in case you are using mysql.

SELECT
team_id as team_id,
coalesce(home_game_counts.games,0) home_games,
coalesce(away_game_counts.games,0) away_games
FROM teams
left join (select home_team_id, count(*) games from games group by home_team_id) as home_game_counts on home_game_counts.home_team_id = teams.team_id
left join (select away_team_id, count(*) games from games group by away_team_id) as away_game_counts on away_game_counts.away_team_id = teams.team_id
GROUP BY teams.team_id, home_game_counts.games ,
away_game_counts.games

Nathan Feger
A: 

This solution is rather ugly, but it should work quickly across large datasets:

select
  teams.team_id
 ,case when home.home_game_count is null
       then 0
       else home.home_game_count
  end home_game_count  
 ,case when away.away_game_count is null
       then 0
       else away.away_game_count
  end as away_game_count
from
  ( 
  select home_team_id as team_id from games
  union
  select away_team_id as team_id from games  
  ) teams
  left outer join
  (  
  select home_team_id as team_id, count(*) as home_game_count
  from games
  group by home_team_id
  ) home
  on teams.team_id = home.team_id
  left outer join
  (
  select away_team_id as team_id, count(*) as away_game_count
  from games
  group by away_team_id
  ) away  
  on teams.team_id = away.team_id
JosephStyons
A: 

Sorry, my mistake in the away_games clause. I changed the comparison operator (to <>) instead of changing the resulting value. I had to create additional data to see the problem.

SELECT team_id,
       teams.team_name,
       SUM(CASE
               WHEN game.home_team_id = game.away_team_id THEN
                1
               ELSE
                0
           END) AS home_games,
       SUM(CASE
               WHEN game.home_team_id = game.away_team_id THEN
                0
               ELSE
                1
           END) AS away_games
  FROM teams
  LEFT OUTER JOIN game ON game.home_team_id = teams.team_id
 GROUP BY team_id, teams.team_name
Stew S