views:

209

answers:

7

I've noticed that a lot of folks here cite tables with 20+ (I've seen as much as 55) columns in one table. Now I don't pretend to be a database design expert, but I've always heard that this is a horrible practice. When I see this, I usually suggest splitting into two tables with a one to one relationship: one containing the most frequently used data, the other with the least often used data. Though at the same time, there's the possible issue of performance (less JOINs and such). So my question is this:

When it comes to really LARGE scale databases, is there actually an advantage to having a large amount of columns, despite the fact that this usually leads to many NULL values?

Which is more of a performance hit: lots of columns with lots of NULLs, or fewer columns with lots of JOINs?

+11  A: 

The design of the table depends on the entity it needs to store. If all the data belongs together, then 50 columns (or even 100) might be the correct thing to do.

So long as the table is normalized, there is no rule of thumb regarding size, apart from database capabilities and the need to optimize.

Oded
+1  A: 

I agree with Oded. I have seen tables with 500 columns in them, and all the columns in them were in the correct place. Just consider the number of facts one might wish to store about an everyday object, and you'll soon see why.

If it proves inconvenient to select all those columns, or to specify which columns to select when you are only interested in a small proportion of them, you may find it worthwhile to define a view.

Brian Hooper
A: 

Which is more of a performance hit: lots of columns with lots of NULLs, or fewer columns with lots of JOINs?

It is purely depends on data you store, indexes you make and so on. No one can ensure you that one works better than another without knowing what are you storing. Generally normalization rules will "force" you separate data to different tables and user FKeys if you have large table but i disagree that it ALWAYS performs better than one big table. You can end with 6-7 level joins in dozens of queries that sometimes will cause errors because there much more chances to create an error in larger queries that in simple ones.

If you post some requirements of what you are doing maybe we can help you with designing the DB properly.

eugeneK
A: 

odbc has a character limit of 8000 .... so that is a physical limit beyond which things get highly frustrating.

I worked on a table that had 138 columns .. it was horribly written and could of been normalised. Although this database seem to of been the creation of someone wondering why there are conventions in database design and deciding to test them all at once.

Having very wide flattened tables is fairly common when you get into data warehousing and reporting servers. They are just a lot faster and mean that you don't have to store your database entirley in ram for performance.

John Nicholas
+1  A: 

How many columns is too many columns?

When you feel it no longer makes sense or is right to add another column.

Generally depends on application.

graham.reeds
A: 

According to my experience it is better to have less joins as those tend to happen too often especially in big database. As long as your database tables are designed to store single entity (student, teacher and so on) this should be ok. So that this will be represented as an object in you code later. So, if you split the entity to several tables you will have to use several joins in order to fill your object later. Also if you use ORM to generate your data access layer (such as Linq in .Net) is will generate separate classes for each table (of course with an relationship between them but still) and this will be harder to use.

Another thing is that you can specify which columns to return in your query and this will reduce the data that is passed to your application, but if you need even a single column from another table you will have to do the join. And in most cases as you have so many columns, then the probability to have large amount of data stored in the db is high. So this join would harm more, than the NULLs.

Every project I have worked on is different so you should find the balance for each story.

Teddy
A: 

It also highly depends on the usecase for your table. If you want to optimize it for reading then it might be a good idea to keep it all together in one table.

In the NO-SQL world (cassandra/hbase for example) there are no constraints on the number of columns and it's actually considered a good practice to have many columns. This also comes from the way it is stored (no gaps). Worth while investigating.

Albert