views:

543

answers:

3

I have to maintain an old database which is not properly normalized. For instance there is a project table that has grown (or maybe mushroomed) to have 5 or more different date columns, for different milestones of the project from being ordered to the delivery date. There are also several tables each with columns for street addresses, mail addresses or web links.

I would like to normalize the structure, create tables for addresses, scheduled dates and the like, and the necessary tables to allow for 1:N relations (address per customer, due date per project and so on).

Right now I'm completely unsure how to handle changes to the data in the detail tables. Consider for example the change of a customer delivery address. Changing the data in the address table is out of the question, because more than one record (possibly in more than one table) could reference that record. Adding a new address record could leave the old record orphaned if no other row has a foreign key relation to it.

I have thought about the following ways to handle this:

  • Add a new detail record, and check in an update trigger of the master table whether the old detail record has to be deleted. This would require knowledge about all tables that have relations to the detail table, in all of them or in a sproc. I don't like this loss of separation. It would also involve more tables in the active transaction.

  • Let the trigger try to delete the old detail record, and catch any errors. This just feels wrong.

  • Live with the orphaned record, and have a periodic maintenance task clean up all detail tables.

What is the preferred way to handle data changes in detail tables that are linked to several master tables? Any tips for reading up on this?

+2  A: 

Live with the orphaned record, and have a periodic maintenance task clean up all detail tables.

l_39217_l
+1  A: 

Part of the problem may be the original schema design: the foreign keys point the wrong way, treating addresses, phone numbers, etc. as master instead of detail. This may be convenient when you want all uses of a given address to update at once, but in my experience it always devolves into too many difficult exceptional cases, for example one person at a location moves so you need to break their link vs an entire household or office moving so you update the existing record. If you try to hide this detail from the user on the CRUD screen, you'll end up with a situation where it just doesn't do what you want.

If it's done that way just to collapse duplicate values, it's effectively a denormalization of the database: the mere existence of the address row is meaningless. The only difference is that unlike most denormalizations, it attempts to gain space efficiency instead of speed. Creating a link table at that point is simply compounding the problem.

If you want, for example, multiple addresses per contact, make the addresses a detail table with a foreign key pointing back to the parent contact, and don't worry about duplicated address values because they're just values. Otherwise, make Address a real entity: add a title or description field and a CRUD screen so it can stand on its own as an entity.

Jeffrey Hantin
The original schema design has no foreign keys, I want to modify the schema in this direction. I will have to think about your answer, looks like the point about keys pointing the wrong way could help.
mghie
After some thinking I'm kind of agreeing with your last paragraph - but what does "they're just values" really mean? Doesn't that completely ignore normalization rules? I'm uncomfortable with the idea of a table with lots of rows differing only in their PK and FKs and equal in all other fields.
mghie
In some sense, this is like worrying about there being so many pixels with the same color value in an image, that differ only by their coordinates. You may want to use some sort of compression -- space optimization -- for transfer or storage purposes, but a compressed image file does not make for efficient reading and writing of single arbitrary pixels.
Jeffrey Hantin
A: 

I think you are blurring the delete and update cases.

If you have client a and client b, and the both use the same address, that would be reflected by records in a relational table (say ClientAddresses, although if you are storing addresses for multiple entities, I am sure it will be more complex than that)

I would think that if two clients share and address and it is incorrect for client a it would be incorrect for client b as well (ie data entry error), but if you are sure that you do not want client a changes to the made to the base address info, remove the association record (delete from ClientAddresses) and add a new address. When you perform the delete from the relational table (presumably from a stored procedure) check to see if there are any other records referring to the address record being disassociated, if not delete from the base table.

cmsjr
Consider a new customer is entered who shares the address with another customer or even a different entity that also has address entries. I'm concerned with the case when only one of those addresses later changes (say the customer relocates). Both addresses are correct, but no longer equal.
mghie
In the case of a relocation, we remove the old address, and add a new address. So 1 delete from relational table (If no other relations we also 1 delete from the base table) one insert to base table, one insert to relation table.
cmsjr