views:

1594

answers:

10

Hi Guys,

I'm trying to get my head round this mind boggling stuff they call Database Design without much success, so I'll try to illustrate my problem with an example.

I am using MySQL and here is my question:

Say I want to create a database to hold my DVD collection. I have the following information that I want to include:

  1. Film Title
  2. Actors
  3. Running Time
  4. Genre
  5. Description
  6. Year
  7. Director

I would like to create relationships between these to make it more efficient but don't know how.

Here is what I'm thinking for the database design:

Films Table => filmid, filmtitle, runningtime, description

Year Table => year

Genre Table => genre

Director Table => director

Actors Table => actor_name

But, how would I go about creating relationships between these tables?

Also, I have created a unique ID for the Films Table with a primary key that automatically increments, do I need to create a unique ID for each table?

And finally if I were to update a new film into the database through a PHP form, how would I insert all of this data in (with the relationships and all?)

thanks for any help you can give, Keith

A: 

You don't really need a YearTable, and all you need is a genre_id, director_id, and actor_id columns in your films table.

Also, your genre, director, and actor tables need their own unique IDs.

Edit: This is, of course, assuming that you're only going to have 1 genre, director, and actor for each movie. Which probably isn't the case.

To have many actors belonging to many movies, you will need a seperate relations table. You could call it "moviesActors" (or actorsMovies) and each row will have an actor_id and a movie_id to say this actor was in this movie.

Dean
"...and all you need is a genre_id, director_id, and actor_id columns in your films table" - but this will create repeated data which is redundant right?
Keith Donegan
"all you need is a...actor_id column in your films table" - that'd be true only if a movie had 1 and only 1 actor.
Mark Brackett
You don't need unique ID's if some other field is unique, in general. You do need a primary key of some sort. Your primary key could be "genre_name" for example, as those will not repeat. Your primary key could be "actor_name" combined with "year of birth" if that is unique.
Sasha
Sasha: not totally true. Name is unique, but makes a terrible primary key. People's names change - people spell names differently, and not to mention a many-bytes name field is slower to join/sort on than a 4-byte integer field.
gregmac
In general, I don't think the joins will be noticeably slower for 99% of the cases, and I like the improvement in readability (also, using name as a primary key might even slightly reduce how often joins are needed.) And yeah, I am assuming names do not change often.
Sasha
Keith: no, generally id fields are not considered redundant, because it is just a link.
gregmac
Sasha: I'd still argue name makes a bad key. Maybe it is varchar(100) to start, but then you find a long name and need to increase the length- there are several places this has to be done now. Also, it can have an impact on the size of the database as you get more entries.
gregmac
+1  A: 

Every table should have a primary key which is unique.

You should read up on database normalization.

A year table is probably unnecessary.

If it's year of release, say, then the year can be stored in the film.

If there are multiple directors on a film, then you would have a separate table which would hold the primary key of the film table and the director table. Similarly for any of the foreign key constraints which are many-to-one or many-to-many. In particular, I believe this would apply to the Actor.

Cade Roux
+4  A: 

Your Films table also needs links to the genre, director, and actors tables. Since the actors, at least will be many to many (one film will list more than one actor, one actor will be in more than one film), you'll need a table to link them.

Films Table => filmid, filmtitle, runningtime, description, genreid, directorid
Genre Table => genreid, genre
Director Table => directorid, director
Actors Table => actorid,actor_name
FilmActor link table => actorid, filmid (with a record linking each actor to each film)

Any table that might be many to many needs a linking table.

thursdaysgeek
+5  A: 

These are the tables I'd use:

films (_id_, title, runningtime, description)
genres (_id_, name)
people (_id_, name, birthdate, etc...)
roles (_roleid_, rolename)
filmgenres (_filmid_, _genreid_)
castandcrew (_filmid_, _roleid_, _personid_)

Instead of having a directors and actors table, just have one table of people. This can also include crew members (in case you want to track who the 2nd Junior Assistant Dolly Grip was). Each movie can be any number of genres (comedy and horror, for example). Plus, the people can take any number of roles on each film - there are quite a number of actor/directors out there.

The Roles table doesn't necessarily mean the character the actor is playing, but it could. It could be "Director", "Producer", "Actor"... or even "Luke Skywalker" if you wanted to get that finely-grained... I believe IMDB does that.

Hopefully the names of the fields above should hint at the foreign keys, and i've put _underscores_ around the primary keys I'd use.

nickf
+1  A: 

I have created a unique ID for the Films Table with a primary key that automatically increments, do I need to create a unique ID for each table?

Yes, each table must have a unique id. But, that's not necessarily the primary auto incrementing key - it's whatever makes that particular instance unique. For instance, for movies, I think it's common to be title + year of release - though you'd want to check with a movie buff (a domain expert) to be sure of that. The auto increment is a fallback - basically, when you really don't have anything else to uniqueify on.

You may use an auto increment key for ease of use in joins and such, but you should have a unique constraint on the uniqueness fields anyway.

As for the actual design, I'd suggest something like:

Films => Primary Key(filmid), Unique Constraint(filmtitle, year), 
         runningtime, description, 
         Foreign Key(Genre), Foreign Key(DirectorId)

Genre Table => Primary Key(Genre)

Director Table => Primary Key(DirectorId), DirectorName

Actors Table => Primary Key(ActorId), ActorName

Films_Actors => Primary Key(Foreign Key(ActorId), Foreign Key(FilmId))

For the insert, well - frankly, it's a PITA. You need to insert in reverse order (and this is where auto increment keys can be an even bigger PITA - if you can add date of birth or something into the Actors and Directors table, then a unique constraint can make it easier).

So, you'd insert Actor(s), Director, Film, and then Films_Actors. Ideally, all in a single transaction. Also, I assume Genre is already filled in, and is a select list - so it doesn't need to be inserted.

Mark Brackett
+1, I don't like using auto_increment fields unless absolutely necessary,
Sasha
Does using a string as a primary key have any implications on performance as opposed to an int?
Simon Hartcher
@deevus - Yes, generally, the smaller the key the better the performance. However, if you can save the join (because Genre is meaningful by itself in the Films table) it's even faster still. Plus, the relative size of the Genre table (probably < 100 rows) makes it pretty much a moot point anyway.
Mark Brackett
+3  A: 

What follows is not actual MySQL code. It seems like what you need is more of a conceptual start here. So here's a model of what your database should look like.

Actor table

  • id (primary key)
  • first name
  • last name
  • etc. (any additional columns you want to store on an actor)

Director table

  • id
  • first name
  • last name
  • etc.

Genre table

  • id
  • name
  • etc.

Film table

  • id
  • title
  • description
  • running time
  • release date
  • director id -- this is a foreign key that refers to the id (the primary key) of the director who directed the film
  • genre id -- like the director id, this refers to the id of the genre the film belongs to

Actor-film index table

  • film id -- this is a foreign key that refers to the id of the film
  • actor id -- this is a foreign key that refers to the id of one actor in the film.

For each actor in the film, you would add a row to the Actor-Film Index. So, if actors 5 and 13 (the primary keys for those actors) starred in film 4 (again, the primary key for that film), you'd have two rows reflecting that fact in your index: One with film id = 4, and actor id = 5, and another with film id = 4, and actor id = 13.

Hope that helps.

Also, this assumes that each film has exactly one director. If any film in your library has two directors (such as Slumdog Millionaire), you'd want to separate out the director id from the film table, and create a Director-Film index like the Actor-Film Index as above.

bigmattyh
Thank you for such a fantastic and detailed response. Much appreciated!
Keith Donegan
+16  A: 

You have to make a distinction between attributes and entities. An entity is a thing - usually a noun. An attribute is more like a piece of describing information. In database jargon, entity = table, attribute = field/column.

Having a separate table for certain things, let's use director, as an example, is called normalizing. While it can be good in some circumstances, it can be unnecessary in others (as generally it makes queries more complicated - you have to join everything - and it is slower).

In this case, having a year table is unnecessary, since there are no other attributes about a year, besides the year itself, that you would store. It is better to denormalize this and store the year in the film table itself.

Director, on the other hand, is different. Perhaps you'll want to store the director's first name, last name, date of birth, date of death (if applicable), etc. You obviously don't want to enter the director's birth date every time you enter a film that this person directs, so it makes sense to have a separate entity for a director.

Even if you didn't want to store all this information about the director (you just want their name), having a separate table for it (and using a surrogate key - I'll get to that in a second) is useful because it prevents typographic errors and duplicates - if you have someone's name spelled wrong or entered differently (first,last vs last,first), then if you try to find other movies they've directed, you'll fail.

Using a surrogate key (primary key) for tables is generally a good idea. Matching an integer is much faster than matching a string. It also allows you to freely change the name, without worrying about the foreign keys stored in other tables (the ID stays the same, so you don't have to do anything).


You can really take this design quite far, and it's all a matter of figuring out what you want to be able to store in it.

For example, rather than have a single director per film, some films have multiple directors.. so there would be a many-to-many relationship between films and directors, so you'd need a table with eg:

films_directors => **filmid, directorid**

Taking it a step further, sometimes directors are also actors, and vice-versa. So rather than even have director and actor tables, you could have a single person table, and join that table in using a role table. The role table would hold various positions - eg, director, producer, star, extra, grip, editor.. and it would look more like:

films => **filmid**, title, otherstuff...
people => **personid**, name, ....
roles => **roleid**, role name, ....
film_people => **filmid, personid, roleid**
genre => **genreid**, name, ...
film_genre => **genreid, filmid**

You might also have a role_details field in the film_people table, which could contain extra information depending on the role (eg, the name of the part the actor is playing).

I'm also showing genre as a many<>many relationship, because possible a film is in multiple genres. If you didn't want this, then instead of the film_genre table, films would just contain a genreid.

Once this is set up, it is easy to query and find everything a given person has done, or everything a person has done as a director, or everyone who has ever directed a movie, or all the people involved with one specific movie.. It can go on and on.

gregmac
Thank you so much for your indepth responce Greg, much appreciated mate. That made alot of sense. Quick question, every time I want to create a relationship between two/three/four tables etc, do i have to create another table to join them all?
Keith Donegan
You only need joining tables for many<>many relationships between entities that have direct relations -eg, films and genres. Genres can have a relationship to people, but it's not stored directly - you just do a join from genre > films > people (eg. you could find people who've been in horror films)
gregmac
I wish I could up vote this twice, excellent response.
Ryan Guest
+2  A: 

I realize your question has already been answered, however I wanted to point you to:
http://www.imdb.com/interfaces

IMDB provides flat-text files of their database (minus primary keys). You might find this useful to populate your database once you get going, or you could use it in your program/website to allow you to simply search for a movies title to add to your "DVD Collection", and have the rest of the information pulled from these.

Simucal
A: 

Sometimes actors are directors and vice versa, maybe you want a "people" table?

leancz
A: 

no one has discussed the user details tables and sales tables.... i will come up with them shortly

sowji