views:

67

answers:

3

The current situation is that topics are sorted by 3 main categories. There is the potential to add more than just the 3 categories but the higher ups wish to implement the ability to add more than just 1 category to a topic.

My original db design has the the categoryID as a Foreign Key in the topic info table. Which was probably a bad idea from the start but I figured they were set on only having 3 categories and doing it this way would allow for fewer queries.

So from what I can see I have two options now: 1) Input the categoryID as a comma separated string that I parse on the php end. 2) Restructure the DB and pull out the categoryID into its own table of categoryID and topicID.

I was wondering what everyone thought of this. My first instinct was to restructure the database. But the first option when I think about it is the easiest to implement and least likely to break something existing by changing the db around. This can also lead to de-normalization however and open up the possibility of inconsistent data.

I have read the de-normalizing is fine so long as you accept the risk of having inconsistent data in exchange for performance. In your opinion will I gain much in performance for this risk? Any input on what I should do in this situation would be appreciated.

Thanks for the help,
Levi

+3  A: 

Do not confuse denormalization (a good example of which is keeping the number of votes on SO question together with question as opposed to calculating it each time from 'votes' table) with the abomination that is comma-separated list of ids.

Model a proper many-to-many relationship; there are just so many things that can (and will) go wrong with comma-separated approach. To name a few:

  1. No referential integrity.
  2. Next to impossible to use in joins.
  3. Impossible to adequately index; non-scalable.
ChssPly76
All good points, your third one about being non-scalable is how I got into this problem originally with my DB design.
Levi
+1 Also it's harder it is to do various aggregate queries, like counting, summing, or averaging the values in the list.
Bill Karwin
A: 

Your best option would be to have a database, like you said, of categoryID-topicID pairs to find which categories the topics belong to.

You COULD do it the other way by exploding the strings in categoryID, but when you search for any topics that are in a certain category, you'll have to run through each field and run a LIKE on it... Much more resource intensive.

Take the time to restructure the DB and you'll end up with a much better result.

BraedenP
A: 

If you need to do something in the DBMS with the individual items, do not store them in list form. This will make your queries run like a dog as your tables get bigger. Of course, if you're only ever going to treat the list as a unit, it's okay to store them that way.

But you'd better be sure you're going to always treat the list as a unit, and no cheating, saying they're a unit and then spiltting them apart somewhere else - better to let the DBMS do that for you.

You should always do 3NF first then if, and only if, you have performance problems, denormalize for speed.

Those fields you're talking about in the question are not the sort that you will be treating as a unit. You will need to do things to the individual elements in the lists, so they should be broken out into another table.

paxdiablo
Ok, I am going to break it into another table, sounds like I will have fewer errors down the road. Just curious, is a list treated as a unit only when it is going to be displayed as is? So if I have 'apples,oranges,bananas,kiwis' and I use it to output 'I like apples,oranges,bananas,kiwis'. That is fine to do?Outputting this,'I like: -apples -kiwis -bananas -oranges'would be a bad idea though because I would have to take the string and split it apart to display it in that order?Is there a real world example you can think of using a list as a unit?
Levi
Yes, that's fine. If you're just going to pull it out as-is and not manipulate it, you can store it like that. That's because there's no extra work for the DBMS or your program to process it. In that case it's actually *better* to do it that way since it's faster than recombining individual columns into a string :-)
paxdiablo
A real world example? How about a dating site where you list your interests (which could be *anything* so may not make sense as a lookup table). You may well enter "walks on the beach, Java development, and torture devices from the 16th century" and this would be presented as-is to prospective candidates (who would then, in all honesty, probably run like hell).
paxdiablo