views:

54

answers:

3

Hello All,

Pardon the elementary question but my newness to the realm of database design leaves me in a bind quite often.

I have a site that keeps growing with regard to families of information. In the beginning I had one sort of item I was describing and all was well. That item occupied one record and had 34 columns (a lot now that I look back) attributed to it of descriptive data. As I get more and more into this stuff, I see that many developers break out data (when practical) into distinct tables.

I've now got additional tables that relate to the original item but are not always needed when describing the original item so I broke them out so they're not queried unnecessarily.

Anyhow, I have a new item I've been trying to organize which is a USER. The user table has typical columns like username, email, last_login, path to associated image, etc. These users have been making comments, which I keep in yet another table that contains columns with IDs that relate to the user and the item on which they are commenting.

Now... I am in the process of adding the obligatory user profile page to the site. Should I create yet another table containing only essential profile data or append the existing user record with profile data in the original user table? I am thinking housekeeping might be a pain if I am to add a "Remove me from site" function as I would have to run something that kills the user record, the user profile record, and any other data associated with that user ID in other tables.

Basically what I am asking is should I keep going with this "granular" design method - breaking everything out into essential parts or does it ever serve me to consolidate into larger tables? I see a few instances where if a user deletes their account, I'll be left with a bunch of non-relevant data. For instance, the original item are restaurants... if I make a table to record "Visits" to restaurants, containing the Restaurant ID and the User ID, if the user or restaurant get removed from the site, this "Visits" table will have a bunch of useless records saying either "non existent restaurant was visited by user 45" or "Restaurant 21 was visited by non-existent user"

I hope I make sense here... I'm just wondering if it's normal to end up with this "junk" data over time.

Thanks much, Rob

+2  A: 

Deleting that "on-relevant" data is a normal, healthy part of an application's life. It's just what happens. You just have to do it, like you brush your teeth or make your bed. Don't let two or three DELETE queries influence how your tables get structured. They're not that expensive, and honestly, if you think that's too much of a pain, you're in the wrong business :)

If you're using InnoDB tables, you can look into foreign key constraints that will take care of some of the cleanup for you.

Scott Saunders
Thanks Soctt.... I'm going to give you solve credit as I guess this entire matter is very subjective and context related. I will look into f k c.
rob - not a robber
+2  A: 

You'll be able to make these decisions much more easily if you learn about normalization: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

Rob
Thhanks, Rob. Nice name
rob - not a robber
+1  A: 

In general, if data all relates to the same logical entity -- the same "thing" -- then it should go in the same table. Breaking one table into two just to keep the tables smaller is generally not a good idea. Depending on what you are doing, it may or may not make queries faster, and it introduces unnecessary complexity. Let me explain.

Whether it makes queries faster depends on the nature of the data and how you use it. If you have some very large field, like "rambling_comments varchar(5000)" or some such, and it is rarely used, then breaking it into a separate table so that what's left in the "main" table is relatively small could indeed make your queries faster, for the fairly obvious reason that there is now less data to read. But if the size of the fields you are thinking of breaking out are modest, and you often need data from both tables, then queries that only use one table don't gain that much, and queries that use both now need to do a join, which is usually more expensive than reading a somewhat bigger record.

But breaking up your tables will certainly make your programs more complex. Now you have to keep track of which data is in which table. You'll constantly be checking if that field is in the Item_Descriptive_Data table or the Item_Stock_Data table or whatever. You're liable to lose track at some point and accidentally put the same field into two tables. (Or worse, you'll decide this is a good idea and do it deliberately.) Then you have redundant and potentially contradictory data.

You have to do joins every time you need data that crosses tables. You create the possibility that records in one or more of the tables may not exist. Like, if you break your User table into User_Main and User_Profile, and you need data from both tables so you do a join, what happens if there is a record in User_Profile with no corresponding record in User_Main? You're going to have to add code to check for the possibility and deal with it. Oh, and blithely saying "That can never happen, no need to worry about it" is a very dangerous attitude: No matter that it's not SUPPOSED to happen, sooner or later it will, and if you don't handle the error gracefully, you could have a real mess.

In short, breaking up tables for performance reasons is usually a premature optimization. If you find that you have some real performance problem, THEN look at the tables and see if you should denormalize for efficiency. But don't start out trashing your database just to avoid a problem that might possibly happen someday.

Jay
Gotcha Jay. Thanks. I guess it is easier to make the tables huge and break later.
rob - not a robber