tags:

views:

31

answers:

2

I have two tables X and Y. This is back end of a web based CRUD application.

Let's say that rows in Y have fields a, b and c. When a row in X is created, it has to link to a snapshot of the values in a specific row of Y. The row in Y may change later but the row in X should still have the values of a, b and c at the time of creation.

What is the "correct" way of doing something like this? Duplicating the fields a, b and c in X seems to be straightforward but I was wondering if there are any better ways of doing this.

+2  A: 

"Duplicating" the columns seems like the right move to me. Some people may erroneously think that you're violating proper database design by duplicating data, but really each table (and its columns) is holding different data. One is the "current" values and the other table has those values at a specific point in time.

An analogy would be addresses on an invoice. If you need to capture the address of a customer at the time of the invoice then you're not duplicating data by putting address columns with the invoice as well as having an address with the customer as they hold potentially different versions of the address.

Tom H.
The added benefit to duplicating the columns will remove the cost of a `JOIN` onto `Y`.
Adam
This makes sense. I'm just wondering if there are any problems with the fundamental way I'm looking at the problem. I'm not very comfortable with designing schemas. I appreciate the "different data" point. I don't think it violates any duplication rules either.
Noufal Ibrahim
Just for the record, I think that adding columns to a table just to avoid a join is one of the worst reasons to do it. Put the columns where they belong. If you have performance issues down the road then maybe denormalize as a last resort. It's my experience that putting columns somewhere to prematurely aid in performance instead of putting them where they naturally belong invariably leads to unintended consequences (read, problems)
Tom H.
In this instance, you feel it's justified and "the right way to go" though?
Noufal Ibrahim
+2  A: 

You need to implement a "Slowly Changing Dimension".

basicly in addition to the "natural" key, you also have a valid_from and valid_to timestamps.

The current row always has a "valid_to" of "9999-12-31 23:59:59" or the higest date/time you system can handle.

The first row would have:-

id       valid_to    valid_from     description.
 1      2009-12-25   9999-12-31     My best new toy.

When you change the description you do not overwrite the current row but add a complete new row an change the valid_to date in the old row thus:-

id       valid_to    valid_from     description.
 1      2009-12-25   2009-06-22     My best new toy
 1      2010-06-23   9999-12-31     A broken toy.

You can then use the create date of your row Y to the appropriate row X flr that date.

Select * from x 
         join Y on x.id = y.id 
                and x.create-date >= valid_from
                and x.create-date <= valid_to

I recomend you use timestamps rather than dates, but the principle is easier to illustrate using dates.

James Anderson
This makes sense. You're basically suggesting that I alter the table `Y` a little bit to make it archive data with the period and then use the time of creation in `X` table to pick up the "right" row of `Y`. Am I correct?
Noufal Ibrahim
Yep thats it! In datawarehousing this tequnique is called "Slowly Changeing Dimension type 1", outside of the DW world it is sometimes know as "bi-temporal tables".
James Anderson