views:

208

answers:

3

Hi all, My first question on here so be nice!

I am a junior developer with not much experience and am having trouble with this problem.

I have a table which needs to be auditable. Let's say this table records phone calls made by a call centre (it's not, but it's just an example). I'll call it "CallHistory".

I had originally planned to keep a separate table called "Callees" which has the callees' name, phone number, etc. This table would be using a surrogate primary key.

The CallHistory table would have a foreign key to the Callee table.

I originally did this so that if I changed a callee's phone number, it would propogate throughout the system and I wouldn't have to change the phone number in multiple tables.

The problem is, the whole point of the CallHistory table is to record the HISTORY of calls, including mis-dialed calls (say a caller dialled the wrong number). The history would be lost using this surrogate key approach.

One of the senior developers at work suggested keeping copies of the phone number for each dialling of a caller at that specific time in the CallHistory table to preserve the history.

I was thinking about keeping an audit/change log table for the same purpose.

Would my approach suffice for this purpsose or am I totally off track? Which approach do you prefer?

Cheers, Andrew

+1  A: 

Your question is very typical design dilemma. For example, if you have database in normal form, and you have the following tables: sales, managers (who sells) and regions (where managers are working). You are building reports like "Yearly sales grouped by regions" where you joining sales with managers and managers with regions. But if one of the managers will relocate to another office during the year, seems that your report will show incorrect data, right?

What are 3 solutions

1) In some cases developers and analyst decide: well, our data is not very correct but it is OK for now, we want to stay with normal form and do not duplicate data. This solution is less complex. In this case you can create Callers and CallHistory tables in normal form, i.e. phone number will be in Callers table only.

2) There is a requirement not to lose any historical changes. And we want our reports and queries be very fast (at the cost of database size). In this case people decide to duplicate all fields. For example you can create CallHistory table that has phone number, callers name, address etc., because you anticipate that each of these fields can be changed in the future. Of course you can create Callee table as well (probably you will need it for another purposes) but it may be reefenced by CallHistory and may be not. Suppose that you think that some records need to be deleted from Callee but want them to be in CallHistory. This is the case when developers often think that they can violate referential integrity of the data, do not create any foreign keys from CallHistory table. And this is reasonable, because without foreign keys, inserts will work faster.

3) Approach I like more, but it is most complex from implementation point of view : CallHistory table should reference to CalleeHistory table. CalleeHistory table will have all fileds that Callee table has, but it also has a surrogate key, like CalleeID + DateModified (sometimes instead of DateModified developers use ModificationVersionNumber). In CallHistory we have a surrogate foreign key that reference CalleeID + DateModified. In this case you have normalized data (i.e. Phone number is not dublicated in different tables), and also you didnt lose any historical changes.

As far as I said, there is often a tradeoff between complexity of implementation, database performance, database size, data integrity and functional requirements to the system. If you are a junior developer, it is nice to have in mind all possible solutions, but probably you should listen to a senior developer, who knows about your system and requirements more than anybody on Stack Overflow.

p.s.

If you want to know about other approaches, read about Slowly changing dimensions, for example http://en.wikipedia.org/wiki/Slowly_changing_dimension

Bogdan_Ch
+2  A: 

I think you're being misled by a subtlety concerning the normal forms here. The thing is, the phone number associated with the callee is not the same piece of information as the number dialed by the caller. They might have the same value in the general case, but this is another issue.

So in my opinion, CallHistory should have both the numbre dialed and a reference to the callee table.

Rik
+1  A: 

I agree with Rik. Yes, redundant data is very, very bad, evil, smelly, and otherwise undesirable. But just because two fields are called "phone number" does not make them the same thing. "Customer's current phone number" and "Customer's phone number at the time we last spoke to him" are not necessarily the same thing at all.

I'm presently working with a database that keeps sale and item information. An item record includes such information as description, stock number, and price. Our sale records also include description, stock number, and price. Description and stock number are redundant and should be eliminated. This was bad design. But price must be included in both places. There's a big difference between current price and the price at the time of a given sale. That sale could have been years ago. The price may have changed a dozen times since then.

In general in an application like you describe I'd just put phone number in the history table and be done with it. There's little to be gained by having a "phone number history" table and linking to the applicable-at-the-time phone number record. It might save a few bytes per record but it would add a bunch of complexity. However, if there are several related fields, the story changes. If, say -- and I'm just inventing an example here to give the idea -- you are a health insurance company and your terms of coverage vary depending on the location because of differing state laws, available doctors in the area, etc, so that when a customer moves his policy must be rewritten, now phone number may be related to many other data items, and so all should go in a single table and you link to the appropriate record. Otherwise you could have a New Jersey phone number but you're linking to California policy terms, etc.

Jay