views:

77

answers:

3

I'm storing data on baseball statistics and would like to do so with three tables: players, battingStats, and pitchingStats. For the purpose of the question, each player will have batting stats or pitching stats, but not both.

How would I normalize such a relationship in 3NF?

+5  A: 

PlayerId would be a foreign key in both BattingStats and PitchingStats tables

[and remember to put some time dimension (season, year, et al) in the stats tables]

and by the way, this is a bad assumption: as far as I know, pitchers are allowed to bat, too!

Steven A. Lowe
+2  A: 

Are you really required not to use more than 3 tables. Normalization normally implies breaking down one non-normalized model into many normalized relations.

If you can have more than 3 tables, you may want to consider the following (in 3NF):

Players:        ([player_id], name, date_of_birth, ...)
Batters:        ([batter_id], player_id)
Pitchers:       ([pitcher_id], player_id)
Batting_Stats:  ([batter_id, time_dimension], stat_1, stat_2, ...)
Pitching_Stats: ([pitcher_id, time_dimension], stat_1, stat_2, ...)

Attributes in [] define the primary key, but a surrogate key may be used if preferred. The player_id attribute in Batters and Pitches should have a unique constraint, and it should also be a foreign key to the Players relation. Batting_Stats and Pitching_Stats should also have a foreign key to Batters and Pitching respectively.

Note however that the above does not enforce that a player can be only a batter or only a pitcher.


UPDATE:

One method I am aware of to enforce that a player is only a batter or only a pitcher, is through this model:

Players:        ([player_id], name, date_of_birth, ...)
Roles:          ([role_id, role_type], player_id)
Batting_Stats:  ([role_id, role_type, time_dimension], stat_1, stat_2, ...)
Pitching_Stats: ([role_id, role_type, time_dimension], stat_1, stat_2, ...)

The role_type should define a pitcher or a batter. Batting_Stats and Pitching_Stats should have a composite foreign key to Roles using (role_id, role_type). A unique constraint on player_id in Roles would ensure that a player can only have one, and only one, role. Finally add check constraints so that Batting_Stats.role_type = 'Batter' and Pitching_Stats.role_type = 'Pitcher'. These check constraint guarantee that Batting_Stats is always describing a batter, and note a pitcher. The same applies for Pitching_Stats.

Daniel Vassallo
It is not immediately obvious to me how the insertion of the tables Batters and Pitchers improves the data model or makes it more normal. Those tables only seem to repeat the data that would be present if player_id were used directly in the Batting_Stats and Pitching_Stats tables.
Larry Lustig
@Larry: Those tables define the set of batters and the set of pitchers from the "players pool". That *is* new information. Then batting stats can only refer to a player from the "batting set", and the same for pitching.
Daniel Vassallo
But that exact same information would be available by examining the player_id column in Batting_Stats and Pitching_Stats. I do see one additional case that's handled by your design, however — defining a player's role in the absence of any stats at all. If that is required then the extra tables would meet that need (as would an extra role_type column in Players).
Larry Lustig
Sometimes this role/type arrangement is necessary, but I don't see it here. Up front, you already know all the roles, so there is no reason to provide the ability to add additional roles. Also, by doing this you are giving meaning to individual rows in the role table ("pitcher" vs. "batter") as a means of enforcing data rules and constructing a proper query. Again, this is sometimes necessary but should be avoided unless absolutely needed.
Phil Sandler
+1  A: 

I know how I would implement this from a practical perspective (I'd create a UNIONed view over the disjoint tables and put a unique index on the player ID - therefore, they can only appear in one table).

Or in the players table, record what type of statistics they have, and then include that in the FK relationship from the stats tables.

But either of these is probably closer to the metal than you want.

Damien_The_Unbeliever