views:

188

answers:

5

I have a relatively simple subset of tables in my database for tracking something called sessions. These are academic sessions (think offerings of a particular program). The tables to represent a sessions information are:

sessions
  session_terms
    session_subjects
      session_mark_item_info
        session_marks

All of these tables have their own primary keys, and are like a tree, in that sessions have terms, terms have subjects, subjects have mark items, etc. So each on would have at least its "parent's" foreign key.

My question is, design wise is it a good idea to include the sessions primary key in the other tables as a foreign key to easily select related session items, or is that too much redundency?

If I include the session foreign key (or all parent foreign keys from tables up the heirarchy) in all the tables, I can easily select all the marks for a session. As an example, something like

SELECT mark FROM session_marks WHERE sessionID=...

If I don't, then I would have to combine selects with something like

WHERE something IN (SELECT...

Which approach is "more correct" or efficient?

Thanks in advance!

+3  A: 

The second approach is more correct. And actually to pull the session information you would join tables, do not be afraid to JOIN that is the whole point of relational databases. You do not want to repeat yourself (normalization). So you would only keep a reference to the parent, and not the parent.parent.

This question comes up a lot for beginners, they think creating the same key in sub sub tables is going to make their life easier as there select can then just become:

SELECT blah FROM MyTableSubSub WHERE SessionID=340

The problem is you are introducing repeatitive data inside of tables that probably do not need to know their parents parent. In fact, somewhere in your design you can find this information by Joining to another table. For instance:

SELECT blah FROM MyTableSubSub mtss INNER JOIN ParentTable p ON p.ID = mtss.ID...

Once you are at that point you can then find the session id i nthe parent table. So do NOT repeate columns in tables that are related.

JonH
Really appreciated this answer. Hard to accept between yours and The Machine's. You are right, I am a beginner with larger databases, and I have been a little afraid of JOINS ;)
Carvell Fenton
+2  A: 

My question is, design wise is it a good idea to include the sessions primary key in the other tables as a foreign key to easily select related session items, or is that too much redundency?

Design-wise, this would add redundant data, and would break normalization. In general you should not be adding redundant fields just to avoid JOINs.

What you describe is called denormalization. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software, but this comes with many trade-offs that are often more important in the vast majority of situations.

Daniel Vassallo
+1 Thanks for the references for normalization and denormalization.
Carvell Fenton
+1  A: 

Your instincts are good. You certainly don't want many (or any) redundant foreign keys around to confuse things. Redundant data should be avoided. In many projects though, there is a couple or small handful of places where reduntant data can really simplify things elsewhere. If this is the once duplicated foreign key, and you get a lot of simplicity for it elsewhere, then do it. If you do go this route, then your challenge is making it clear. Two things you can do:

  • have these superflous foreign keys way down the field list in each table, so a developer looking at it later will get a hint that this is extra data, and not the model's real foreign key relationship.
  • Name the field something different, not sessionID. Call it . . . ParentSession. Calling it SessionID will be a problem.

But remember, these modern relational DB's are good at joining, as long as your indexes are good. Only do this if it really makes things simpler.

Patrick Karcher
A: 

The best design approach is really to have each of the tables have a foreign key reference to its immediate parent. Having keys of all the parents above the table, will only cause redundancy, and is not advisable. Your concern about performance can be done away with, if you take care to index the tables (in this case you wouldn't need indexing since it is the primary key that you are referencing in the tables below).

And also , using WHERE something IN(Select ..) will cause performance problems for sure. I suggest you can use/customize joins with the schema that i have suggested to create any kind of query for the tree in your database. As an example. For your requirement of finding sesssion marks in a session, you will write

Select * from session_marks
from session, session_terms, session_subjects, session_mark_item_info,session_marks

Where 
session_marks.parent_id = session_mark_item_info.id
and session_mark_item_info.parent_id = session_subjects.id
and session_subjects.parent_id = session_terms.id
and session_terms.parent_id = session.id
and session.id = <some value>

Since the joins are on the indexed(in fact primary) columns, they will be super-fast. You dont have to worry about the performance.

The Machine
Downvoted for using poor syntax of implied joins. These are 18 years out of date and should not be used. They are a very poor practice (hard to maintain, susceptible to inadvertent cross joins) and people should not be encouraged to use them.
HLGEM
Did a little more research on HLGEM's comment concerning implicit vs explicit, and although there are a few discenting voices out there, it seems that the explicit syntax is preferred for readability and debugging, although it appears not to affect performance. So... switched accepted to JonH's answer. Thanks again.
Carvell Fenton
-1 - returning * is poor practice. You should list ALL columns you need and never use *. Joins need to be stated explicitly.
JonH
Although folks have pointed out some issues with this answer, it was still useful in helping me get a better understanding of the concept, so I feel it was useful for that reason.
Carvell Fenton
What are all the downvotes for ? My aim was to explain the concept of constructin queries for mutlitable database trees.@JonH - Select * is poor practice ?? My aim was to show how the join was to be made, the user will obviously select only the selected columns. I guess people just cant accept others answers getting accepted.
The Machine
@The Machine - I could care less if the OP selected your answer or mine. Your answer is full of old syntax and you do not explicitly state your joins. You also present poor practice by returning * so -1 to bring ensure begineers or other's coming to the site do not use this code. That is the whole point of SO.
JonH
+1  A: 

I agree with all the prior posts (lots of upvotes there).

One reason you might be considering addng the extra column(s) is that, without them, you end up writing a lot of redundant code (five tables, four inner joins, once or more per database call, yuk). One work-around for this issue is to create a view that "pre-joins" the tables; with this in place, you wouldn't have to rewrite the N-way join each time you had to query down N levels. Some psuedo-code:

CREATE VIEW MyHierarchy AS
SELECT 
  sessions_id
  session_terms_id
  session_subjects_id
  session_mark_item_info_id
  session_marks_id
from sessions
inner join session_terms on ids
inner join session_subjects on ids
inner join session_mark_item_info on ids
inner join session_marks on ids

(Use left outer joins if parent items do not always have child items.)

As an example, to get the mark_item_info for a given session_term, you'd run something like

SELECT mii.*
 from MyHIerarchy mh
  inner join session_mark_item_info mii
   on mii.session_mark_item_info = mh.session_mark_item_info
 where mh.session_term_id = @session_term_id

thus saving a chunk of coding time. And, check the query optimization plan but I'm 95% sure of this, the tables mentioned in the view but not actually used in the query would be "dropped" from the execution plan since they weren't required.

Philip Kelley
+1 for the view suggestion. Thanks
Carvell Fenton