views:

120

answers:

2

Hi,

Lets say I am analyzing how high school sports records affect school attendance.

So I have a table in which each row corresponds to a high school basketball game. Each game has an away team id and a home team id (FK to another "team table") and a home score and an away score and a date. I am writing a query that matches attendance with this seasons basketball games.

My sample output will be (#_students_missed_class, day_of_game, home_team, away_team, home_team_wins_this_season, away_team_wins_this_season)

I now want to add how each team did the previous season to my analysis. Well, I have their previous season stored in the game table but i should be able to accomplish that with a subselect.

So in my main select statement I add the subselect:

SELECT COUNT(*) FROM game_table
WHERE game_table.date
BETWEEN 'start of previous season' AND 'end of previous season'
AND (
  (game_table.home_team = team_table.id
    AND game_table.home_score > game_table.away_score)
  OR (game_table.away_team = team_table.id
    AND game_table.away_score > game_table.home_score))

In this case team-table.id refers to the id of the home_team so I now have all their wins calculated from the previous year.

This method of calculation is neither time nor resource intensive. The Explain SQL shows that I have ALL in the Type field and I am not using a Key and the query times out. I'm not sure how I can accomplish a more efficient query with a subselect. It seems proposterously inefficient to have to write 4 of these queries (for home wins, home losses, away wins, away losses).

I am sure this could be more lucid. I'll absolutely add color tomorrow if anyone has questions

A: 

In my opinion, this is another case of "trying to get my rdbms to do everything for me". You have to accept that you can't optimise your database for some scenarios / queries, and no matter how many indexes you add, things like sub-queries are going to slow down the larger parent query.

You above example is a clear case of this: the data you are showing from the sub-query (the performance of a team in the previous season) is static data - it's historical and can't / won't change. You should query for this data separately, hold it in-memory in your application, and manually add it to the results of you main query as you loop over it.

iAn
Thanks that makes sense. For the sake of athis example say I had to generate the results in SQL. Any suggestions?
Arnold
A: 

Your best option in MySQL would be to have a view that you can join to as many times as you need.

If MySQL supported CTEs then you could do it all in a single statement...

with prior_season as (
  select ...
)
  select
     ...
  join prior_season as hone_wins on ...
  join prior_season as home_losses on ...
  ... etc

If this is something you're going to want to do a lot, it might be a decent argument for switching to a different RDBMS. Postgres supports them now, and pretty much all of the major commercial vendor's free versions do as well.

Donnie