views:

448

answers:

3

I'll try to give this as generically as I can so it's reusable.

I am running a site with a fairly large MySQL database which has grown to need some summary/rollup tables initialized. For the example's sake, let's say it's soccer statistics. Since I handle multiple soccer leagues in the same database, many of them play games of different lengths - for instance, indoor soccer leagues play four quarters while most outdoor leagues play halves.

I have three tables important to this exercise. I've redacted all of the fields that I don't consider significant to the answer I'm looking for.

GAME
`game`.id
`game`.home_team_id
`game`.away_team_id
`game`.number_of_periods

GOAL 
// Records for each goal scored in the game   
`goal`.id
`goal`.game_id
`goal`.team_id
`goal`.period_number
`goal`.player_id
`goal`.assist_player_id

PERIOD_SUMMARY
`period`.id
`period`.game_id
`period`.team_id
`period`.number
`period`.goals_scored

Ultimately I should have records for EVERY period played in the period summary table, regardless of whether or not a goal was scored. This table only needs to be initialized once, as it's fairly easy to add the appropriate zero-filled records via a trigger on game creation and fire on insert/update requests to update the period_summary table.

It is also fairly easy for me to group all of the goals and initialize the period summary table with the SUM(), what I am having a bit of trouble figuring out an efficient way to "fill" any periods that don't have a goal scored with a 0.

What I am trying to figure out is if it's easier/more efficient to:

  1. Write the trigger and prefill the entire period_summary table with 0-filled values, then run the query I already know to update the appropriate records for periods in which goals were scored.
  2. Use some other method (perhaps a temporary stored procedure?) that will only 0-fill records where there is not a match in the goals table.
A: 

ISTM that option 1 is clearly easier: you already know how to increase bump a counter if you can trust the counter is already there. Suppose you would be going with option 2, not only is it more difficult to fill in the lacking zeros (I presume that should happen at the end of the period), you would also have to find a way to start the counter with 1 if there is no previous entry and the first goal is scored.

As for space efficiency: ultimately, you will need the same space on disk either way. It would be slightly more efficient to only fill the zeros at the end of a period, but surely the space that completed periods will take up is larger than the space for running periods, anyway.

As for insert/update efficiency: you will need to perform a lookup when a goal is scored either way, because there might be a non-zero counter already. So you need to create an index that allows efficient lookup by game, team, and period. Given that the query that always does an update is shorter, there is a good chance that it is also more efficient.

Martin v. Löwis
+2  A: 

You already have a placeholder. The "placeholder for unknown data" in SQL is null.

You don't need to pre-fill anything: either you have a row with some columns having an unknown value (null), or you have no row at all, so that doing an outer join will get a row that is all null. Either way, the attribute data (essentially, non-id fields) will be null.

And the sum() aggregate will ignore nulls.

So let's say that you do have a row for a game (since it's pre-scheduled), but no corresponding rows for its periods (since they have not yet been played). Then you do an outer join form game to period (outer, so that you include both games with and games without, period data):

select a.*, sum(b.goals_scored)
from game a left outer join period b on (b.game_id = a.id)
group by a.id;

This shows you the total goals (for both teams) by game; for games with no periods, you get back null (which means in SQL, "we don't (yet) know")

This query shows you only the total goals for completed games and games in progress (games for which at least one period has been played):

select a.*, sum(b.goals_scored)
from game a join period b on (b.game_id = a.id)
group by a.id;

This view filters out incomplete games (assuming you always add early periods before later ones) :

create view complete_games as
select a.* from games a
where exists (select * from period b 
where b.game_id = a.id and b.number = a.number_of_periods)

Using that view, we can then sum only completed games:

select a.*, sum(b.goals_scored)
from complete_games a join period b on (b.game_id = a.id)
group by a.id;

So, no need to pre-fill, no need for a trigger, most importantly, no need to add false data (claiming zero goals when in fact the period has not yet been played), no need to update with correct data. Just insert the period when you have data for it.

tpdi
Sorry, my description may have been confusing in my original question. Note that the summary table doesn't exist yet and the main goal of this exercise is to initialize that table with data. I do need to create "0" records for all periods played in a game (in the past) where no goal was scored.
AvatarKava
Yeah, my point is, you don't need to "initialize"; just add records when you have data after the period has been played.
tpdi
In saying that I meant fill the table with data from games that are already played.
AvatarKava
A: 

Are you aware you can set the default value of a column ? With the default keyword.

ex.:

CREATE TABLE Person ( age INT DEFAULT 0, name VARCHAR(35) DEFAULT 'Bob' )

The default value will be what you want (0 for instance) instead of null.

It doesn't solve everything but it will help.

Silence