views:

79

answers:

3

I'm doing a data model for a roller derby league to track their matches. I track things like lap times, penalties per lap, penalties per period, and penalties per match.

The problem is that in some cases, I will only have the overall data; I might have "penalties per match" for one match and "penalties per period" for another. So at the lowest level, for some matches, I'll have the very detailed data (penalties per hap), and at the highest level I'll have penalties per match.

I'm not sure how to model/use this to do reporting when I don't have a high detail for some records. I thought about something like this:

PenaltiesPerMatch MatchID PenaltyCount

PenaltiesPerPeriod MatchID PeriodID PenaltyCount

PenaltiesPerLap MatchID PeriodID LapID PenaltyCount

But my concern is that the higher-level information can be derived from lower level. Do I duplicate records (e.g. fill in a record for penalties per period with data that is also in penalties per lap, summed by period?) or keep unique records (don't put in penalties per period for data that I have already in penalties per lap; calculate it by summing on period).

+2  A: 

What I would do is record the information that you have. For some matches, record it in high detail, for others in low detail.

When you report on the matches:

  • Calculate the sums per match for the high detail matches
  • Use the sum per match from the low detail matches

Store data at the lowest detail level that you have; calculate the higher detail levels.

Andomar
how would you handle the situation where there is only high level data? In this case a calculation would not work.
Aaron M
The calculation turns the highly detailed (per quarter) records into low detail (per match) records. So if everything is in low detail, there is no need for a calculation.
Andomar
Yes, but the OP states that in some cases he may not have low level data, only the high level data. If he doesn't always have the lowest level data, then he cannot calculate the high levels of data.
Aaron M
For each match, you either have the high or the low level data. For the first kind of match, you already have the high level data. For the second kind of match, you can calculate the high level data from the low level data. This is fairly easy to do in SQL.
Andomar
A: 

I think it depends on what information is valuable to the customer. If they would like to have the information by period, then you should include that as a separate record. Penalty by period and by match must be separated.

I you always had the penalty by period information, then you could do a query that sums the data.

If your periods is always a fixed number, then you could probably just do two columns in the table instead of a new table to hold the period information

Aaron M
+1  A: 

You could save the information in one table, with NULL values indicating that you don't have the data down to that level. You wouldn't be able to put a primary key over that, so you would need a surrogate key, but you should be able to use a unique constraint.

For example:

CREATE TABLE PenaltyCounts
(
    penalty_count_id INT NOT NULL,
    match_id         INT NOT NULL,
    period           TINYINT NULL CHECK (period BETWEEN 1 AND 3),
    lap              SMALLINT NULL,
    penalty_count    SMALLINT NOT NULL,
    CONSTRAINT PK_PenaltyCounts PRIMARY KEY NONCLUSTERED (penalty_count_id),
    CONSTRAINT UI_PenaltyCounts UNIQUE CLUSTERED (match_id, period, lap),
    CONSTRAINT CK_lap_needs_period CHECK (lap IS NULL OR period IS NOT NULL)
)

One problem with this for which I don't see an easy solution yet is how to enforce that they ONLY can enter penalties at one level. For example, they could still do this:

INSERT INTO PenaltyCounts (penalty_count_id, match_id, period, lap, penalty_count)
VALUES (1, 1, NULL, NULL, 5)
INSERT INTO PenaltyCounts (penalty_count_id, match_id, period, lap, penalty_count)
VALUES (2, 1, 1, NULL, 3)
INSERT INTO PenaltyCounts (penalty_count_id, match_id, period, lap, penalty_count)
VALUES (3, 1, 2, NULL, 2)

The advantage of this single-table solution is that your statistics can all be found by querying one table and the GROUP BYs will roll everything up nicely.

You could also use the separate table method but put views over them to pull everything together. This still allows the problem above though of putting numbers in at multiple levels.

Tom H.