tags:

views:

513

answers:

3

I'm setting up a table that might have upwards of 70 columns. I'm now thinking about splitting it up as some of the data in the columns won't be needed every time the table is accessed. Then again, if I do this I'm left with having to use joins.

At what point, if any, is it considered too many columns?

+4  A: 

It's considered too many once it's above the maximum limit supported by the database.

The fact that you don't need every column to be returned by every query is perfectly normal; that's why SELECT statement lets you explicitly name the columns you need.

As a general rule, your table structure should reflect your domain model; if you really do have 70 (100, what have you) attributes that belong to the same entity there's no reason to separate them into multiple tables.

ChssPly76
+1 cracked me up
Yannick M.
if you have a table "Person" you typically have columns like "name", "sex", "dateOfBirth" etc. if you start adding columns like "isSoccerPlayer" and "numberOfTeethPulled" just because the max limit of the database columns has not been reached yet, not only are you crazy and creating a bad database, you are actually making it harder to work on. you might think you are making it easier, but you really are not. you are fighting how databases work, look into normalization
KM
@KM - that's why I said "attributes belonging to same entity on domain model". High number of columns in the table does NOT make it denormalized; it's what said columns represent that matters. Besides, while normalization is definitely a good thing it's NOT a solution to all life's problems. Trick question - do you think the number of votes next to SO question / answer is calculated as `select count(*) from votes` every time or do you think that perhaps it's denormalized? Does that make SO database bad and Jeff Atwood crazy?
ChssPly76
@ChssPly76, it is a relational database not an object model. there are tables, rows and columns, work within that constraint if you want max performance, mimic your objects for convenience at the sake of performance. So should every piece of information about a person be stored within the same row? no, break them out and group them into different tables (using my example form my previous comment): "Person", "Activities" "HealthRecords". Storing a SUM for performance reasons is a completely different issue than keeping all data in 70 columns to avoid joins.
KM
Should "numberOfTeethPulled" be a part of Person record? No, it probably shouldn't be stored at all - you'll get that info from "ToothExtractionRecord" if your domain model requires such level of detail. But that's YOUR (and, dare I say, rather contrived) example - it has nothing to do with my point: large number of columns in a table does NOT mean table is denormalized. Think real estate contracts / purchase orders / other financial documents just to name a few examples. Can they be further split up into multiple tables? Yes. Any reason to do so? Not really.
ChssPly76
@ChssPly76, name 70 columns for a real estate contract or a purchase order that belong on one row?
KM
+4  A: 

There are some benefits to splitting up the table into several with fewer columns, which is also called Vertical Partitioning. Here are a few:

  1. If you have tables with many rows, modifying the indexes can take a very long time, as MySQL needs to rebuild all of the indexes in the table. Having the indexes split over several table could make that faster.

  2. Depending on your queries and column types, MySQL could be writing temporary tables (used in more complex select queries) to disk. This is bad, as disk i/o can be a big bottle-neck. This occurs if you have binary data (text or blob) in the query.

  3. Wider table can lead to slower query performance.

Don't prematurely optimize, but in some cases, you can get improvements from narrower tables.

jonstjohn
+4  A: 

It is too many when it violates the rules of normalization. It is pretty hard to get that many columns if you are normalizing your database. Design your database to model the problem, not around any artificial rules or ideas about optimizing for a specific db platform.

Apply the following rules to the wide table and you will likely have far fewer columns in a single table.

  1. No repeating elements or groups of elements
  2. No partial dependencies on a concatenated key
  3. No dependencies on non-key attributes

Here is a link to help you along.

JohnFx