views:

55

answers:

3

Here is the scenario: You have a Persons table with a one-to-many relationship with an Addresses table, where one of the Address rows is the "primary" Address.

Is it better in a normalized schema to

  • Use a Persons.PrimaryAddressID to access the "primary" Address for a Person

or

  • Use an Addresses.IsPrimary bit column to reference the "primary" Address for a Person via Addresses.PersonID

or

  • Other

and why?

A: 

I would go for "Use a Persons.PrimaryAddressID to access the "primary" Address for a Person". Primary Address has a meaning only when Person is linked to Addresses. So it should belong to Person. Think about following scenarios where 2nd approach fails.

a) Address is used with another entity without referenced to person where Addresses.IsPrimary is meaningless.

b) Same address is used by two persons where 1st uses as primary while 2nd not.

Sujee
Good point about how "primary" only has meaning when the link exists.
erg39
A: 

It depends on whether the person-to-address relationship is one-to-zero-plus or one-to-one-plus.

If a person is required to have a primary address, I would put it in the Persons table itself (since it's a required attribute).


In the other hand, if a person can exist in your schema without an address, I would leave all addresses in the Addresses table as equal and use an attribute of the Persons table to select the primary (either NULL or a pointer to the relevant Addresses row).

If you store the primality of an address in the Addresses table, what do you do when two addresses for Bob Smith both claim to be the primary? You could stop that with triggers but it's far more efficient to design your schema properly.

And, if two room-mates share the same address, but one lives there all the time and the other spends most of his time shacked up with his girlfriend, what happens then? If the primality is in the Addresses table, you won't be able to share address rows between persons.


What I'm trying to get across is that you need to allocate your attributes to the right objects. A person's primary address belongs to a person, not an address.

For maximum efficiency and flexibility, I would have the following schema:

Persons:
    Id                 primary key
    PrimaryAddressId
    OtherStuff
Addresses:
    Id                 primary key
    OtherStuff
PersonAddresses:
    Id                 primary key
    PersonId           foreign key on Persons(Id)
    AddressId          foreign key on Addresses(Id)

You have the minor data integrity problem that Persons.PrimaryAddressId may be a hanging pointer. You can't make it a foreign key to one of the primary keys since you want it to allow NULL. That means you'll have to cater for the possibility that it might point to a non-existent Addresses.Id.

I would simply fix that as a before-delete trigger on Addresses so that the relevant Persons rows are updated (setting PrimaryAddressid to NULL).

Or you could be tricky and have one address of "Unknown" in the Addresses table so that every row in Persons has at least one address (those whose primary address is unknown automatically get their PrimaryAddressid set to the "Unknown" address row.

Then you could make it a proper constrained relationship and simplify your SQL somewhat. Pragmatism often beats dogmatism in the real world :-)

paxdiablo
Would you still use this design if the Persons->Addresses relationship was constrained to one-to-many, as opposed to the man-to-many this permits? In that case, there may be occasional duplicate Addresses, but in your roommate scenario, for instance, if one roommate moves out you do not want to change the Address for both roommates.
erg39
If it's truly one-to-many, I'd rethink it. But I don't believe this particular situation warrants it. Making this one-to-many duplicates data in the database unnecessarily. Re the moving situation, you wouldn't update the addresses row for the science geek staying behind, you'd simply change the computer nerds primary to point to a new address record, already existing if his girlfriend is on-campus or just add a new one if she goes to the cheerleading school up the road :-)
paxdiablo
Technically there shouldn't be duplicate rows in a table at all. An address is an object and all attributes for that object should be in it's single row. What happens when you have two rows for 7 Smith Street and they have different ideas about the address (like rent to charge, or rooms left to rent)? DBAs travel the globe trying to remove problems like these. Don't make me come over there :-)
paxdiablo
The current prototype actually does what you note above, inserts a new, "Unknown" Addresses row any time a Person is saved without a PrimaryAddressID.Re duplicate Addresses: I understand what you are saying, but the number of duplicate Addresses will be relatively small, and it seems that implementing potentially buggy business logic to deal with the various cases around duplicating and un-duplicating Addresses would be potentially less robust than simply having a few duplicates. No?You may now rap my knuckles with a ruler :)
erg39
A: 

If you want the constraint to be that one person has at most one primary address, a Persons.PrimaryAddressID is clearly simpler -- indeed, it's enforced by the very schema. It's also easy to ensure exactly one primary address per person (just make that column be not null), and if you need to, even say that no two people can share a primary address (just make that column be unique).

Of course, exactly because this approach excels at enforcing such simple constraints, it's bad when you don't want these constraints -- for example, if you want to make it possible for a person to have more than one "primary" address, the approach in question would not work.

Incidentally, I would not consider a one-Person/many-addresses relationship to be particularly good unless you want to enforce the fact that no two people can share the same address: in general, in a normalizing mood, I'd rather have a table of people, one of addresses, and one for the relationship (which in most contexts would naturally be many-to-many, since many people can, and in real life do, share the same address).

If you choose to go this route, then, especially if you need high flexibility (multiple primary addresses &c), having the relationship table carry the "primality" boolean would be an attractive choice (it still makes it not too hard to enforce some of the above-mentioned constraints, though other constraints, such as "an address belongs to at least one person" or vice versa, can be tricky to express simply).

Lesson to retain: express exactly what constraints your schema needs to express simply, and the right schema for that purpose often emerges quite clearly. If the constraints of interest are a mystery, so will be the answer to the question "what's the right schema";-).

Alex Martelli