views:

391

answers:

5

I'm creating a table with 30-50 columns. There are about 200K of these rows. Is it recommended to store this data in separate tables? Are there performance issues when you have this many columns.

I'll explain a bit about the table. I have to store all sports games over the past 10 years (basketball, baseball, football, hockey). For each of these, I need to keep additional data. Some of this data allows me to reuse fields across sports. For example, every team has a home and away team and a event date.

However, for each of these games I'm also storing things like how many first downs were acheived, how many strikeouts, and three pointers. Obviously, this data only relates to some of the rows in the table. I end up having a lot of NULL fields in each row as a result.

I can give more specifics if necessary. Thanks in advance for any general advice.

+4  A: 

I think the problem is you have a model like this (the store everything in one table approach). This approach and also this approach are two of the alternatives you could pick- I'm sure others would have some more suggestions.

They all have their pros and cons. I can't comment on the performance characteristics of them in MySql, but certainly the other approaches reduce the use of nulls, which can only be a good thing.

If you are genuinely interested in the differences between the 3 approaches, I would recommend buying Martin Fowler's Patterns of Enterprise Application Architecture book.

In terms of the performance characteristics- you might want to look at questions like this one and also this one.

You can read about vertical partitioning in MySql here.

RichardOD
But don't start partitioning until you're satisfied with your degree of normalization.
reinierpost
A: 

I would definitely look at normalizing the table. While I'm not sure about the performance benefits, there would most likely be a storage benefit with a large amount of entries.

My first change would be to have any data that relates to only 1 or 2 sports and have them in separate tables with a foreign key from the main table

Farrell
+2  A: 

To elaborate on RichardOD's answer, you generally have three options when dealing with subtyping, and which you choose depends on what you need to do with the data in question.

The first option is the one you're currently using: keep all columns related to the different types in one table, with flags and nulls used to indicate which type a given record is. It is the simplest way to manage subtyping, and it generally works well when you only have a few types or if the different types aren't very different. In your case, it seems like the types can vary quite a bit.

The second option is to keep a central table that contain all of the common columns between the subtypes, and have one-to-one relationships with other tables that contains the type-specific details of those types.

The third option is to not think of the different types as subtypes at all and just keep all the types' records in separate tables. So you'd have no common table between the types that keeps the common data, and each table would have some columns that are repeated across tables.

Now, each option has its place. You'd use the first option when there aren't many differences between the different types. You'd use the second option if you need to manipulate the common fields independently of the type-specific fields; for example, if you wanted to list all sports games in a big grid with general information, and then let users click to see the type-specific details of that game. You'd use the third option when the types aren't really very related at all and you're just storing them together out of convenience; dissimilar schemas, even if it shares a few fields, shouldn't be merged.

So think about what you need to do with the data and how it fits into the three options and decide for yourself which is best. If you can't decide, update your question with the details about how you plan to use the data and I or someone else should be able to help you more.

Welbog
A: 

Yes, use lots of columns if that makes sense. Provided you're not using an antipattern like "field1,field2,field3" etc, then it's fine.

Lots of NULLs is good, they don't hurt much. Also 200k is such a tiny number of rows you're unlikely to see many performance problems. I don't know how many inserts you're planning to do into this table, but if it is < 100 per second, I don't see anything being a problem.

You will want to index it somehow. The number of indexes will affect insert performance, but I imagine that most of your columns won't need to be indexed.

With such a small table it doesn't really matter too much - none of it. You can duplicate your data umpteen times without running into any space problems- you're in a privileged position.

MarkR
A: 

200K times 50 values is not a huge table. Don't worry about performance until you have such things as ease of use and freedom from self contradiction under control.

There are a variety of reasons to decompose a table. Decomposing a table means splitting it into two or more tables with most columns going into only one table, and other columns going into more than one table (foreign keys).

Farell mentioned mormalization. The primary benefit to normalization is that it precludes certain kinds of update anomalies, including ones that allow contradictory facts to be stored in the same table. Storage benefits are secondary. Performance benefits, if present, are likely to be minor. Having said that, normalization is the most important thing you can learn about table design. If you violate normalization rules without understanding the consequences, you're flying blind.

If I were introduced to a database table with 40 columns or more and there were any kind of problem in the databse (performance, corruption, or whatever), I'd look into whether that table can be further normalized, and what are the cost/benfits of doing so.

There are a variety of reasons to partition a table. As Reinerpost said, don't start worrying about partions until you've got normalization under control.

Walter Mitty