views:

750

answers:

12

Every Customer has a physical address and an optional mailing address. What is your preferred way to model this?

Option 1. Customer has foreign key to Address

   Customer   (id, phys_address_id, mail_address_id)
   Address    (id, street, city, etc.)

Option 2. Customer has one-to-many relationship to Address, which contains a field to describe the address type

   Customer   (id)
   Address    (id, customer_id, address_type, street, city, etc.)

Option 3. Address information is de-normalized and stored in Customer

   Customer   (id, phys_street, phys_city, etc. mail_street, mail_city, etc.)

One of my overriding goals is to simplify the object-relational mappings, so I'm leaning towards the first approach. What are your thoughts?

+3  A: 

The second option would probably be the way I would go. And on the off-chance it would let users add additional address' (If you wanted to let them do that), that they could switch between at will for shipping and such.

Erling Thorkildsen
+4  A: 

I'd prefer #1. Good normalization and communicates intent clearly. This model also allows the same address object (row) to be used for both addresses, something I have found to be quite valuable. It's far too easy to get lost in duplicating this information too much.

krosenvold
I'd be interested in how this was done in the UI. I have built a system that allows address objects to be shared, but users didn't get the concept.
cdonner
I think you mean the second one, based on your description. They're all marked #1
singpolyma
@singpolyma No. Question is edited now, too..
krosenvold
+1  A: 

I'd go for the first option. In these situations I'm very weary of YAGNI (you aren't going to need it). I can't count the number of times I've looked at schemas that've had one-to-many tables "just incase" that are many years old. If you only need two, just use the first option; if the requirement changes in the future, change it then.

James Gregory
+6  A: 

I tend towards first approach for all the usual reasons of normalisation. This approach also makes it easier to perform data cleansing on mailing details.

If you are possibly going to allow multiple addresses (mail, residential, etc) or wish to be able to use effective dates, consider this approach

   Customer   (id, phys_address_id)
   Cust_address_type (cust_id, mail_address_id, address_type, start_date, end_date)
   Address    (id, street, city, etc.)
Karl
+1  A: 

Like in many cases: It depends.

If your customers deal with multiple addresses then a to-many relationship would be appropriate. You could introduce a flag on address that signals if an address is for shipment or bill, etc. Or you store the different address types in different tables and have multiple to-one relationships on a customer.

In cases where you only need to know one address of a customer why would you model that to-many? A to-one relationship would satisfy your needs here.

Important: Denormalize only if you encounter performance issues.

Markus Lux
+1  A: 

Option 3 is too restrictive, and option 1 cannot be extended to allow for other address types without changing the schema. Option 2 is clearly the most flexible and therefore the best choice.

cdonner
I agree that it is the most flexible, but is it worth the trade-off in ORM complexity?
Jen
+1  A: 

I would go with option 1. If you want to, you could even modify it a little bit to keep an address history:

Customer   (id, phys_address_id, mail_address_id)
Address    (id, customer_id, start_dt, end_dt, street, city, etc.)

If the address changes, just end date the current address and add a new record in the Address table. The phys_address_id and mail_address_id always point to the current address.

That way you can keep a history of addresses, you could have multiple mailing addresses stored in the database (with the default in mail_address_id), and if the physical address and mailing address are identical you'll just point phys_address_id and mail_address_id at the same record.

Thomas Müller
-1 - With this design you've got a two-way dependency between the Address and Customer tables, which makes the database schema more brittle, while adding no real benefit.
Greg Beech
+3  A: 

One important fact you may need to consider (depending on your problem domain) is that people change addresses, and may want to let you know in advance of their address change; this is certainly true for utility companies, telcos, etc.

In this case you need to have a way to store multiple addresses for the customer with validity dates, so that the address can be set up in advance and automatically switch at the correct point. If this is a requirement, then a variation on (2) is the only sensible way to model it, e.g.

Customer (id, ...)
Address (id, customer_id, address_type, valid_from, valid_to)

On the other hand, if you don't need to cater for this (and you're sure you won't in the future) then probably (1) is simpler to manage because it's much easier to maintain data integrity as there's no issues with ensuring only one address of the same type exists, and the joins become simpler as they're only on one field.

So either (1) or (2) are fine depending on whether you need house-moves, but I'd steer clear of (3) because you're then repeating the definition of what an address is in the table, and you'll have to add multiple columns if you change what an address looks like. It's possibly slightly more performant, but to be honest when you're dealing with properly indexed joins in a relational database there isn't a lot to be gained, and it's likely to be slower in some scenarios where you don't need the address as the record size for a customer will be larger.

Greg Beech
+2  A: 

When answering those kinds of questions I like to use the classifications of DDD. If it's a Entity it should have a separate ID, if it's a value object it should not.

Tobias Hertkorn
+1  A: 

In most code I write nowadays every customer has one and only one physical location. This is the legal entity beeing our business partner. Therefore I put street, city etc in the customer object/table. Often this is the possible simplest thing that works and it works.

When an additional mailing address is needed, I put it in a separate object/table to not clutter the customer object to much.

Earlier in my career I normalized like mad having an order referencing a customer which references a shipping address. This made things "clean" but slow and inelegant to use. Nowadays I use an order object which just contains all the address information. I actually consider this more natural since a customer might change his (default?) address, but the address of a shipment send in 2007 should always stay the same - even if the customer moves in 2008.

We currently implement the VerySimpleAddressProtocol in out project to standardize the fields used.

mdorseif
+2  A: 

We are moving forward with a model like this:

Person (id, given_name, family_name, title, suffix, birth_date)
Address (id, culture_id, line1, line2, city, state, zipCode, province, postalCode)
AddressType (id, descriptiveName)
PersonAddress (person_id, address_id, addressType_id, activeDates)

Most may consider this excessive. However, an undeniable common theme amongst the apps we develop is that they will have some of these fundamental entities - People, Organizations, Addresses, Phone Numbers, etc.. - and they all want to combine them in different ways. So, we're building in some generalization up-front that we are 100% certain we have use cases for.

The Address table will follow a table-per-hierarchy inheritance scheme to differentiate addresses based on culture; so a United States address will have a state and zip field, but Canadian addresses will have a province and postal code.

We use a separate connecting table to "give" a person an address. This keeps our other entities - Person & Address - free from ties to other entities when our experience is this tends to complicate matters down the road. It also makes it far simpler to connect Address entities to many other types of entities (People, Organizations, etc.) and with different contextual information associated with the link (like activeDates in my example).

qstarin
+1 That is the design I would have posted. I've used that connecting table to drive a lot of functionality. Data cleansing even marketing mailers. It comes in useful when those who want reports can see what customers where mailed to in the last campaign, etc..
Optimal Solutions
+2  A: 

I've found that what appears to be an initially sound "customer master" design often later breaks because: previously treating a business as the customer evolves into treating individual employees as the customers, or a customer will change/split/merge addresses, or a business changes its name but we still want to consolidate old and new performance totals, or a shipping address and billing address now have to be expanded to include a support address, or operators forget or mistake one address purpose for another, or a customer wants to use a special shipping address only temporarily, or etc etc.

As a result, I've come to abandon the idea of a master customer file altogether. Instead, name/company/address information are never master records (except for applications like utility billing or property taxes where a specific physical address is never editable into another address), they're just fields denoting the contact in use AT A PARTICULAR POINT IN TIME, usually inside something like a sales order record. Each sales order is chained to the previous and next order for that customer, even when the customer changes their name or address. The advantage is that all orders can be consolidated/totaled/analyzed across the customer's entire history of transactions, even though each order might vary the contact name or address. It's somewhat counterintuitive, especially when attempting to please normalizing db designers, but it ends up being very flexible and convenient.

For example, when customer X first places an order, no customer record is created. Instead, a sales order record is created which contains the necessary name/company/address information in effect at the time of the order. When customer X places his second order, we don't search a customer file, we search the sales order file, then copy/chain it to create his second sales order. If he wants to change his name/company/address info, fine, we just edit those fields in sales order #2 and sales order #1 remains unchanged. Now he's locatable under either variation (order 1 or 2).

For other considerations when trying to decide if two customer records are really the same, see http://semaphorecorp.com/mpdd/mpdd.html

joe snyder