views:

402

answers:

8

I am wondering when and when not to pull a data structure into a separate database table when it appears in several tables.

I have pulled the 12 attribute address structure into a separate table because I have a couple of different entities containing a single address in this format.

But how about my 3 attribute person name structure (given, middle, surname)?

Should this be put into its own table referenced with a foreign key for all the entities containing a name... e.g. the company table has a contact person name, the citizen table has a person name etc.

Are these best left as attributes in the main tables or should they be extracted?

+1  A: 

It's about not repeating information, so you don't want to store the same information in two places when one will do.

Another useful rule of thumb is one entity per table. If you find that one table contains, say, "person" AND "order" then you probably should split those into two tables.

And (putting myself at risk of repeating information...) you might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.

Start with these...

What is normalisation?

What is important to keep in mind when designing a database

How many fields is 'too many'?

More tables or more columns?

Ed Guiness
But since the new PersonName table rows are being pointed to with foreign keys then the PersonName information will not dissapper just by deleting a Company row or a Citizen row. However its exsistence is only justified by the existence of these pointing rows.
lox
It would be wasteful to break out a seperate table just for PersonNames. ;) But if you did, you could use Cascade Delete so that when the Person is deleted, the database deletes the corresponding PersonName. In SQL Server, that's an option on the relationship.
Brian MacKay
Now if that PersonName was used by several different things - lets say PersonNameID 5 was Brian MacKay for instance and it appeared in PersonID 200 and CitizenID 120, you could no longer delete PersonID 200 because the Cascade would fail. So: either combine Person and Citizen into one table, or simplify your life by not breaking the name structure out to a seperate table, or both. I recommend doing both.
Brian MacKay
A: 

Extract them. Your aim should be to have no repeating data in your database. Read about Normalization

DanDan
You know, with things like First Name/Last Name, it's not necessarily repeating data, but repeating data structures. To me it's okay to repeat these structures as long as you aren't repeating data. The trade-off in terms of simplicity is worth it.
Brian MacKay
No, this is fine. The problem is when you are replicating the same customer throughout your database, ie "Brian MacKay" appearing in every table. There's no way to ever consolodate this user then to a single user (there may be purchases made by multiple yous).
DanDan
Yes exactly DanDan. So you want to be use PersonID 5, which is Brian MacKay, as your foreign key. I can't think of many cases where you would want to instead use PersonNameID 2030. You would probably end up wanting to get at some Person data, and you would have to do work backwards to find out what Person the Name belongs to. Unecessary headache.
Brian MacKay
+1  A: 

Creating a person entity across your data model will give you this present and future advantages -

  1. The same person occurring as a contact, or individual in different contexts. Saves redundancy.
  2. Info can be maintained and kept current with far-less effort.
  3. Easier to search for a person and identify them - i.e. is it the same John Smith?
  4. You can expand the information - i.e. maintain addresses for this person far more easily.
  5. Programming will be more consistent and debugging will be easier as well.
  6. Moves you closer to a 'self-documenting' system.
blispr
Just be very careful that your entities are semantically the same, and don't just happen to have the same attributes at this point in time...
Benjol
A: 

It really depends on the problem you are trying to solve. In general it is probably a good idea to have some sort of 'person' table which holds details of people. However, there are occasions where that is potentially a very bad idea.

One example would be if you are holding details of prescriptions written out to people by a doctor. In some countries it is a legal requirment that the prescription details are held with the name in which they were prescribed NOT the name the person is going under currently. For instance a woman might be prescribed a drug as miss X, but then she gets married and becomes Mrs Y. If you had a person table that was linked to the prescriptions table you would now have the wrong details and would possibly face legal consequences. In that case you would need to probably copy the relevant details of the person into the prescription table, even though this would be duplicating data.

So again - it depends on the problem you are trying to solve. Don't just blindly follow what people consider to be best practices. Understand your data and any issues surrounding it, then try to follow best practices that fit.

Russell Troywest
+1  A: 

I would usually keep the address on the Person table, unless there was an unusual need for absolutely uniform addresses on each entity, or if an entity could have an arbitrary number of addresses, or if addresses need to be shared between entities, or if it was a large enterprise product where I know I have to invest in infrastructure all over the place or I will end up gutting everything down the road.

Having your addresses in a seperate table is interesting because it's flexible, but in the context of a small project lacking a special need like the ones mentioned above, it's probably a slight waste. Always be aware of the balance between complexity and flexibility. Flexibility is important, but be discriminating... It's easy to invest way too much there!

In concrete terms, the times that I experimented with (for instance) one-to-one relationships for things like addresses, I ended up refactoring them back into the table because it introduced a bunch of headaches including more complex queries, dealing with situations where the address does not exist, etc. More entities also increases your cognitive load -- it makes the project harder to think about. In my case, it was an unecessary cost because there was no concrete need and, in truth, not even a gain in flexibility.

So, based on my experiences, I would "try" to keep the addresses in the same table, and I would definitely keep the names on them - again, unless there was a special need.

So to paraphrase Einstein, make it as simple as possible and no simpler. But in the short term, experiment. It's the best way to learn these lessons.

Brian MacKay
The address format is absolutely uniform, only one address per enitity, no shared adresses, it is a large enterprise solution. But if each entity points to an address in the address entity I loose the assurance the when an entity is deleted the address follows. Pointing from the address entity to the other would be impractical since their primary keys formats are not identical.
lox
Well, if you end up doing a one-to-one relationship I have a feeling you won't be able to cascade delete. You might have to use triggers which to me is not ideal. You could set up a one-to-many, but then you would end up having a bunch of nullable keys and so on, which is maybe okay if you really want to break it out. I've done that with logging once and it was okay -- I had one logging structure that had nullable foreign keys for users and also a few others. It seemed to work fine. You would have to just understand that even though it's a one-to-many, for now each person only has one address.
Brian MacKay
You could actually look at this as added flexibility... For instance, maybe down the road you will actually want to introduce more addresses. This way the structure is already in place. However, at the same time it's going to be a little harder to work with than if it was just on the Person record.
Brian MacKay
A: 

As a counterpoint to the other (entirely valid) replies: within your application's current structure, how likely will it be for a given individual (not just name, the actual "person" -- multiple people could be "John Smith") to appear in more than one table? The less likely this is to happen, the less likely you are to get benefits from normalization.

Another way to think of it is entities. Outside of labels (names), is their any overlap between "customer" entity and an "employee" entity?

Philip Kelley
A: 

Depends on what you're using the database for.

If you want fast queries on your tables you should de-normalize your tables. Having to run multiple JOIN's will take longer and make your queries more complex.

On the other hand if your intention is to have a flexible storage database which is not meant to be hit with a ton of fast-response queries, then normalizing the tables by splitting them out into multiple xref'ed tables will provide more flexibility in your design and reduce the need for submitting duplicated data.

Since de-normalization is "optimization", I would suggest you normalize the tables first, index them properly and see if you're getting any bottlenecks on your queries. If so, flatten the affected tables where needed.

ascotan
A: 

You should really consider your whole database structure and do a ER diagram (entity relationship diagram) first. OF COURSE there should be another table called "Person" where the concept of a person is stored...

Dimitri Wetzel