views:

65

answers:

6

I'm attempting to create a database model for movie classifications, where each movie could have a single classification from each of one of multiple rating systems (e.g. BBFC, MPAA). This is the current design, with all implied PKs and FKs:

TABLE Movie 
( 
    MovieId INT -- PK
)

TABLE ClassificationSystem 
( 
    ClassificationSystemId TINYINT -- PK
)

TABLE Classification 
(
    ClassificationId INT,          -- PK
    ClassificationSystemId TINYINT -- FK
)

TABLE MovieClassification 
(
    MovieId INT,          -- composite PK, FK
    ClassificationId INT, -- composite PK, FK
    Advice NVARCHAR(250)  -- description of why the classification was given
)

The problem is with the MovieClassification table whose constraints would allow multiple classifications from the same system, whereas it should ideally only permit either zero or one classifications from a given system.

Is there any reasonable way to restructure this so that a movie having exactly zero or one classifications from any given system is enforced by database constraints, given the following requirements?

  • Do not duplicate information that could be looked up (i.e. duplicating ClassificationSystemId in the MovieClassification table is not a good solution because this could get out of sync with the value in the Classification table)
  • Remain extensible to multiple classification systems (i.e. a new classification system does not require any changes to the table structure)?

Note also the Advice column - each mapping of a movie to a classification needs to have a textual description of why that classification was given to that movie. Any design would need to support this.

A: 

From what you're saying, ClassificationSystemId is part of the key for a MovieClassification, since there can be only one (or zero) MovieClassification for a given system for a given movie.

Now, there are three cases where the Classification table can change:

  1. You add a new classification to a system.
  2. You remove an existing classification from a system.
  3. You change metadata for a classification (none shown in your posted schema).

In the first case, an example would be adding a new genre to an existing system of genres. It makes sense that you need to reclassify movies which belong to the new genre, so the model holds.

In the second case, an example would be removing a genre from an existing system. It still makes sense that you need to reclassify movies which belonged to the old genre, so the model still holds.

In the third case, you would change for instance the name of a genre. It makes sense that movies already classified as that genre gets their genre name changed. The model still holds.

From what I can understand, the correct normalization is to put ClassificationSystemId in MovieClassification and make it part of the MovieClassification key (and make ClassificationSystemId part of the key for Classification rows in the provided schema):

-- Tables Movie, ClassificationSystem not included for brevity

CREATE TABLE Classification
(
  ClassificationId INT,
  ClassificationSystemId INT,
  PRIMARY KEY(ClassificationId, ClassificationSystemId),
  FOREIGN KEY(ClassificationSystemId) REFERENCES ClassificationSystem(ClassificationSystemId)
);

CREATE TABLE MovieClassification
(
  ClassificationId INT,
  ClassificationSystemId INT,
  MovieId INT,
  Advice NVARCHAR(MAX),
  PRIMARY KEY(ClassificationId, ClassificationSystemId, MovieId),
  FOREIGN KEY(ClassificationId, ClassificationSystemId) REFERENCES Classification(ClassificationId, ClassificationSystemId),
  FOREIGN KEY(MovieId) REFERENCES Movie(MovieId),
  UNIQUE(ClassificationSystemId, MovieId)
);
Håvard S
I mean that there is nothing enforcing that if you change the 'master' version in `Classification`, that any copies of the information would get updated also. Of course you could do that imperatively, but that doesn't buy anything because you could check the current constraints imperatively.
Greg Beech
Well, that doesn't really make sense to me. If you change the "master" in `Classification`, you're either adding a new classification or deleting an existing (or changing column data for existing classifications). In all cases the relation still holds without any imperative enforcement as far as I can understand.
Håvard S
See my updated answer for a solution.
Håvard S
A: 

You can enforce that with a check constraint that calls a user defined function. For example:

create function dbo.ClassificationSystemCheck()
returns int
as begin
    return (select max(cnt)
    from (
        select count(*) as cnt
        from MovieClassification mc
        left join Classification c
        on c.ClassificationId = mc.ClassificationId
        group by mc.MovieId, c.ClassificationSystemId
    ) qry)
end
go
alter table MovieClassification
add constraint chk_MovieClassification
check (dbo.ClassificationSystemCheck() <= 1)
go
alter table Classification
add constraint chk_Classification
check (dbo.ClassificationSystemCheck() <= 1)
go
insert into Classification select 1,1
insert into MovieClassification select 1,1
insert into MovieClassification select 1,1 -- Boom!

This can be inefficient as the number of classifications grows. Alternatively, you could eliminate the Classification table and move the ClassificationSystemId to the MovieClassification table.

Andomar
A: 

What if you remove the classification system id from the classification table and only keep it on the movie classification?

TABLE Movie 
( 
    MovieId INT
)

TABLE ClassificationSystem 
( 
    ClassificationSystemId TINYINT
)

TABLE Classification 
(
    ClassificationId INT,
)

TABLE MovieClassification 
(
    MovieId INT,
    ClassificationId INT,
    ClassificationSystemId TINYINT,
    Advice NVARCHAR(250) -- description of why the classification was given
)

but you buy another problem, that a classification could be used ouside its intended system.

joshuacronemeyer
I think that's trading one issue for a worse one; a classification belongs to a system and this would give no way to definitively work out which system a classification belongs to.
Greg Beech
A: 

How many distinct classifications of the same classification system can be attributed to one single movie under your design ?

Does that match your intended concept of "classification" ?

Erwin Smout
This should be a comment, not an answer.
JohnFx
"each movie could have a single classification from each of one of multiple rating systems"
Greg Beech
A: 
  1. add a unique constraint on Classification (ClassificationId,ClassificationSystemId)
  2. add a FK referring to it from MovieClassification table
  3. add a unique constraint on MovieClassification (MovieId,ClassificationSystemId)
AlexKuznetsov
A: 

OK. I had hoped my questions would trigger some thinking, but my point seems to have been missed.

Your classification table needs to be {movieID, classificationScheme classification}, with key {movieID classificationScheme}.

It can reference Movie via {movieID}, and it can reference a classifications table via {classificationScheme classification}.

This classifications table lists/enumerates/names all the valid classifications of each scheme. Since a classification scheme only exists, and makes sense, from the moment it has at least one classification, there may not be a real need for a fourth table whose sole purpose it is to list/name/enumerate all the relevant classification schemes.

Erwin Smout