views:

148

answers:

5

I'm building a small application and setting up foreign key relationships between tables. However I'm confused as to WHY I really need this? What is the advantage - does it assist me when writing my queries that I don't have to perform any joins? Here's an example snippet of my database:

+-------------------+
| USERS             |
+-------------------+
| user_id           |
| username          |
| create_date       |
+-------------------+

+-------------------+
| PROJETS           |
+-------------------+
| project_id        |
| creator           |
| name              |
| description       |
+-------------------+

There is a key relationship between users.user_id and projects.creator

Would I be able to perform a query like so?

SELECT * FROM PROJECTS WHERE USERS.username = "a real user";

Since MySQL should know the relationship between the tables? If not then what is the real function of Foreign keys in a database design?

A: 

If each user belongs to exactly one project and each project belongs to exatctly one user then it is said that the tables have a one to one relationship and having a key relationship between users.user_id and projects.project_id is ok (though probably not standard).

However, if a user can belong to many projects (or a project can have many users) then you have a one to many relationship and you need a foreign key.

If projects can have many users and users can belong to many projects then you have a many to many relationship and you need a more sophisticated model.

EddieC
I've setup plenty of database designs without creating any foreign keys and they have used a one to many relationship. I want to know what Foreign keys do for me that I can't do without them.
Marco Ceppi
Given your table layouts above, What would your SQL statement be to get all the users who are part of project_id=1?
EddieC
@Marco Ceppi: While you've modelled relationships without using foreign keys, there's nothing to stop a user from entering data that is not related into the column(s) that represent that relationship.
OMG Ponies
So which part of this answers the question here?
Jani Hartikainen
+9  A: 

Foreign keys provide referential integrity. The data in a foreign key column is validated - the value can only be one that already exists in the table & column defined in the foreign key. It's very effective at stopping "bad data" - someone can't enter whatever they want - numbers, ASCII text, etc. It means the data is normalized - repeating values have been identified and isolated to their own table, so there's no more concerns about dealing with case sensitivity in text... and the values are consistent. This leads into the next part - foreign keys are what you use to join tables together.

Your query for the projects a user has would not work - you're referencing a column from the USERS table when there's no reference to the table in the query, and there's no subquery being used to get that information before linking it to the PROJECTS table. What you'd really use is:

SELECT p.*
   FROM PROJECTS p
   JOIN USERS u ON u.user_id = p.creator
WHERE u.username = 'John Smith'
OMG Ponies
Thank you - I figured I could get a way from Joins with this - but you answer helps describe exactly what this does.
Marco Ceppi
+3  A: 

Basically, they won't give you any more functionality. They stop any inserts or updates breaking the referential integrity of your data model.

Another key aspect in my opinion is that they communicate the data model to any other developers using it. I have often looked at what foreign keys a table has to see how the data model fits together at a glance.

Fiona Holder
I'd say the data validation is functionality - a foreign key constraint is like a CHECK constraint on steriods. CHECK constraints only validate the data; Foreign keys mean you can add associated information to that relationship--in addition to the data validation.
OMG Ponies
+1  A: 

If you never do joins, you don't need foreign keys.

Come to think of it, if you never do joins, you don't need a relational database! (little joke) Seriously, if you have more than one table, you'd better learn how to use joins, and how to design foreign keys into a schema.

As previous responders have said, foreign keys enforce referential integrity. Without referential integrity, joins produce mysterious results.


My earlier response failed to note the real question behind the question. The question I answered was "why are there foreign keys in SQL schemas", and the answer is "in order to do joins". But rereading the the question, I'm understanding a much more subtle question, which is "why can't SQL do the joins for me, if it knows the linkages". This is a really good question. It deserves a better answer than the above.

A language where the engine supplies the join conditions is possible. One need only look at the graphical query design tool in Microsoft Access. Once one has declared all the intertable relationships to Access, one can pull data from multiple tables without specifying the join conditions all over again. Access figures them out automatically.

If one builds a two table query in Access, and then switches to SQL view, one will see that Access has in effect created a join with a join condition. Such a capability is possible in character based languages as well, but SQL is not such a language.

I note in passing that many projects can belong to one user. So Users is the "reference table" in the above schema, not Projects. I expect the easier automatic navigation direction would be automatic lookup from a reference table, not the other way around.

Walter Mitty
+1  A: 

Using a foreign key constraint can provide the following:

  • Prevent the database containing inconsistent data by preventing mismatched keys
  • Prevent the database containing inconsistent data by automatically deleting orphaned rows (with ON DELETE CASCADE)
  • Serve to document to future developers which column is a foreign key to which

Of course it's not mandatory to do any of those things, but likely to improve code quality over time. Being strict about thing is usually good - it leads to more errors in testing (and hence fewer in production)

MarkR