views:

1171

answers:

7

I'm wondering if this is a good design. I have a number of tables that require address information (e.g. street, post code/zip, country, fax, email). Sometimes the same address will be repeated multiple times. For example, an address may be stored against a supplier, and then on each purchase order sent to them. The supplier may then change their address and any subsequent purchase orders should have the new address. It's more complicated than this, but that's an example requirement.

Option 1 Put all the address columns as attributes on the various tables. Copy the details down from the supplier to the PO as it is created. Potentially store multiple copies of the

Option 2 Create a separate address table. Have a foreign key from the supplier and purchase order tables to the address table. Only allow insert and delete on the address table as updates could change more than you intend. Then I would have some scheduled task that deletes any rows from the address table that are no longer referenced by anything so unused rows were not left about. Perhaps also have a unique constraint on all the non-pk columns in the address table to stop duplicates as well.

I'm leaning towards option 2. Is there a better way?

EDIT: I must keep the address on the purchase order as it was when sent. Also, it's a bit more complicated that I suggested as there may be a delivery address and a billing address (there's also a bunch of other tables that have address information).

After a while, I will delete old purchase orders en-masse based on their date. It is after this that I was intending on garbage collecting any address records that are not referenced anymore by anything (otherwise it feels like I'm creating a leak).

+1  A: 

Do you want to keep a historical record of what address was originally on the purchase order?

If yes go with option 1, otherwise store it in the supplier table and link each purchase order to the supplier.

BTW: A sure sign of a poor DB design is the need for an automated job to keep the data "cleaned up" or in synch. Option 2 is likely a bad idea by that measure

JohnFx
while I agree that needing a cron job to clean up your data is a VERY bad sign, Option 2 is the correct way to do this. Option 2 is the normalized solution. as cagcowboy notes, there shouldn't actually be a need for a cron job.
rmeador
In my opinion it is overnormalized and will create more headache than it is worth. Option 1 is fine as long as the addresses describe the entities in the tables they are added to, as long as they aren't duplicated excessively.
JohnFx
A: 

Why would any of the rows on the address table become unused? Surely they would still be pointed at by the purchase order that used them?

It seems to me that stopping the duplicates should be the priority, thus negating the need for any cleanup.

cagcowboy
Eventually I will delete old purchase orders so at some point an address may no longer be used by anything.
WW
If the address references are all foreign keys (so any table that references the addresses table has a formal foreign key to it), then the delete trigger or update-of-address-id trigger can attempt to delete the old address-id in the addresses table. If that succeeds, there were no references left.
Jonathan Leffler
Cont: If the triggered delete fails, it means that another row is still referencing the address row (probably; there could be other errors), and as long as the DBMS does not prevent you from ignoring that error (which it shouldn't), then you can continue with the referencing delete or update.
Jonathan Leffler
@Joanthan Leffler: Agree that your trigger approach is technically correct, but I think it's better for performance to delete address records in the background. There are 10 child tables of address in my current model.
WW
+8  A: 

I actually use this as one of my interview questions. The following is a good place to start:

Addresses
---------
AddressId (PK)
Street1
... (etc)

and

AddressTypes
------------
AddressTypeId
AddressTypeName

and

UserAddresses (substitute "Company", "Account", whatever for Users)
-------------
UserId
AddressTypeId
AddressId

This way, your addresses are totally unaware of how they are being used, and your entities (Users, Accounts) don't directly know anything about addresses either. It's all up to the linking tables you create (UserAddresses in this case, but you can do whatever fits your model).

One piece of somewhat contradictory advice for a potentially large database: go ahead and put a "primary" address directly on your entities (in the Users table in this case) along with a "HasMoreAddresses" field. It seems icky compared to just using the clean design above, but can simplify coding for typical use cases, and the denormalization can make a big difference for performance.

Eric Z Beard
Good solution; I've tended to use this sort of thing myself in the past. It works well. Actually, I've frequently used a more generic implementation, of having a "Contact" and "ContactType" that includes address; this handles users with no physical address but that have email, etc.
McWafflestix
I'm only likely to have a delivery address and and bill-to address, so I was thinking: PO.DeliveryAddressID and PO.BillToAddressID columns. It's not likely any further types will be added so a I think modelling the many-to-many is overkill.
WW
Question: What if a user/customer/company had more than one address?--a billing and a shipping address, for example. How about you store two address types against one user idea with that structure?
Mel
+1  A: 

I think I agree with JohnFx..

Another thing about (snail-)mail addresses, since you want to include country I assume you want to ship/mail internationally, please keep the address field mostly freeform text. It's really annoying having to make up an 5 digit zip code when Norway don't have zip-codes, we have 4 digit post-numbers.

The best fields would be:

  • Name/Company
  • Address (multiline textarea)
  • Country

This should be pretty global, if the US-postal system require zip-codes in a specific format, then include that too but make it optional unless USA is selected as country. Everyone know how to format the address in their country, so as long as you keep the linebreaks it should be okay...

Stein G. Strindhaug
I agree totally.
Stefan
+3  A: 

Option 2, without a doubt.

Some important things to keep in mind: it's an important aspect of design to indicate to the users when addresses are linked to one another. I.e. corporate address being the same as shipping address; if they want to change the shipping address, do they want to change the corporate address too, or do they want to specify a new loading dock? This sort of stuff, and the ability to present users with this information and to change things with this sort of granularity is VERY important. This is important, too, about the updates; give the user the granularity to "split" entries. Not that this sort of UI is easy to design; in point of fact, it's a bitch. But it's really important to do; anything less will almost certainly cause your users to get very frustrated and annoyed.

Also; I'd strongly recommend keeping around the old address data; don't run a process to clean it up. Unless you have a VERY busy database, your database software will be able to handle the excess data. Really. One common mistake I see about databases is attempting to overoptimize; you DO want to optimize the hell out of your queries, but you DON'T want to optimize your unused data out. (Again, if your database activity is VERY HIGH, you may need to have something does this, but it's almost a certainty that your database will work well with still having excess data around in the tables.) In most situations, it's actually more advantageous to simply let your database grow than it is to attempt to optimize it. (Deletion of sporadic data from your tables won't cause a significant reduction in the size of your database, and when it does... well, the reindexing that that causes can be a gigantic drain on the database.)

McWafflestix
Once an address is in use, do not edit it for any reason. If you need to change something to a new address, see if that new address already exists (and use it) or insert a new one. Then attempt to delete the old address, but don't fuss (no error) if it cannot be deleted because it is still in use.
Jonathan Leffler
A: 

In the case of orders, you would never want to update the address as the person (or company) address changed if the order has been sent. You meed the record of where the order was actually sent if there is an issue with the order.

The address table is a good idea. Make a unique constraint on it so that the same entity cannot have duplicate addresses. You may still get them as users may add another one instead of looking them up and if they spell things slightly differently (St. instead of Street) the unique constraint won't prevent that. Copy the data at the time the order is created to the order. This is one case where you want the multiple records because you need a historical record of what you sent where. Only allowing inserts and deletes to the table makes no sense to me as they aren't any safer than updates and involve more work for the database. An update is done in one call to the database. If an address changes in your idea, then you must first delete the old address and then insert the new one. Not only more calls to the databse but twice the chance of making a code error.

HLGEM
The same entity may have more than one address. They can have the address they sent last year's POs to and another address from this year, etc. My proposed address table would not store details of the entity at all.
WW
I meant duplicates of the same address, not more than one address per entity.
HLGEM
A: 

I've seen every system using option 1 get into data quality trouble. After 5 years 30% of all addresses will no longer be current.

Stephan Eggermont