views:

149

answers:

6

Hola,

The two tables below can both hold the same data - a full year, including some arbitrary info about each month

table1 (one row = one month)
------
id
month
year
info


table2 (one row = one year)
------
id
year
jan_info
feb_info
mar_info
apr_info
may_info
jun_info
jul_info
aug_info
sep_info
oct_info
nov_info
dec_info

Table A

  • Seems more intuitive because the month is numeric, but its
  • 10x more rows for a full year of data. Also the
  • Rows are smaller (less columns)

Table B

  • 10x less rows for a full year of data, but
  • Single rows are much larger
  • Possibly more difficult to add more arbitrary info for a month

In a real world test scenerio I set up, there were 12,000 rows in table1 for 10 years of data, where table2 had 150. I realize less is better, generally speaking, but ALWAYS? I'm afraid that im overlooking some caveat that ill find later if I commit to one way. I havent even considered disk usage or what query might be faster. What does MySQL prefer? Is there a "correct" way? Or is there a "better" way?

Thanks for your input!

+6  A: 

Don't think about how to store it, think about how you use it. And also think about how it might change in the future. The storage structure should reflect use.

The first option is more normalized by the second, so I would tend to prefer it. It has the benefit of being easy to change, for example if every month suddenly needed a second piece of information stored about it. Usually this kind of structure is easier to populate, but not always. Think about where the data is coming from.

If you're only using this data for reports and you don't need to aggregate data across months, use the second option.

It really depends on what the data is for and where it comes from. Generally, though, the first option is better.

Welbog
+1 for if every month needed a second piece of info.
ceejayoz
+3  A: 

12000 rows for 10 years of data? I say that scale pretty well since 12000 rows is next to nothing with a decent DBMS.

How are you using the database? Are you sure you really need to worry about optimizations?

If you need to store data that is specific to a month then you should absolutely store a row for each month. It's a lot cleaner approach compared to the one with a column for each month.

Daniel
+1 for 12k rows being miniscule.
ceejayoz
A: 

How are you using the data? If you are often doing a report that splits the data out by month, the second is easier (and probably faster but you need to test for yourself) to query. It is less normalized but but honestly when was the last time we added a new month to the year?

HLGEM
Uh, how would the second be easier? "`SELECT info FROM table1 WHERE month='09'`" seems about as easy as you can get...
ceejayoz
I think he meant that its easier if the data is going into a report with similar layout. im not sure i agree though because the queries ive been using for both tables are pretty simple
I mean if he wants to display multiple months in the same report. Select case when month = '09' then info else null end, case when mnonth = '10 then info else null end from table1 where year = '2009'viceselect septifo, octinfo from table1Of course it can get even mor complicated if you are using agreegate data for the months.
HLGEM
+1  A: 

"In a real world test scenerio I set up, there were 12,000 rows in table1 for 10 years of data, where table2 had 150."

How? There would have to be 80 months in a year for that to be the case.

ceejayoz
i didnt give details because its hard to relate that test, though i think the point is still valid, that its at least 10x more rows in table1
@stabby: irrelevant. RDBMS are optimised for large ammounts of data (rows). (Also, Access is **not** a propper RDBMS, and is the only place where I think there would be a problem with 12K+ rows)
voyager
+1  A: 

Since this is an optimising problem the optimising answer applies: It depends.

What do you want to do with your data?

Table A is the normal form in which one would store this kind of data.

For special cases Table B might come in handy, but I'd need to think hard to find a good example.

So either go with A or give us some details about what you want to do with the data.

A note on disc space: Total disc space is is a non issue, except for extremely huge tables. If at all discspace per select matters, and that should be less for the Table A design in most cases.

A note on math: if you divide 12000 by 12 and get 150 as an result, something is wrong.

Jens Schauder
A: 

In general I'd say one record per month as the more general solution.

One important issue is whether "info" is and must logically always be a single field. If there are really several pieces of data per month, or if it's at all likely that in the future there will be, than putting them all in one table gets to be a major pain.

Another question is what you will do with this data. You don't say what "info" is, so just for purposes of discussion let's suppose it's "sales for the month". Will you ever want to say, "In what months did we have over $1,000,000 in sales?" ? With one record per month, this is an easy query: "select year, month from sales where month_sales>1000000". Now try doing that with the year table. "select year, 'Jan' from year_sales where jan_sales>1000000 union select year, 'Feb' from year_sales where feb_sales>1000000 union select year, 'Mar' from year_sales where mar_sales>1000000 union ..." etc. Or maybe you'd prefer "select year, case when jan_sales>1000000 then 'Jan=yes' else 'Jan=no', case when feb_sales>1000000 then 'Feb=yes' else 'Feb=no' ... for the remaining months ... from year_sales where jan_sales>1000000 or feb_sales>1000000 or mar_sales>1000000 ..." Yuck.

Having many small records is not that much more of a resource burden than having fewer but bigger records. Yes, the total disk space requirement will surely be more because of per-record overhead, and index searches will be somewhat slower because the index will be larger. But the difference is likely to be minor, and frankly there are so many factors in database performance that this sort of thing is hard to predict.

But I have to admit that I just faced a very similar problem and went the other way: I needed a set of flags for each day of the week, saying "are you working on this day". I wrestled with whether to create a separate table with one record per day, but I ended up putting seven fields into a single record. My thinking is that there will never be additional data for each day without some radical change in the design, and I have no reason to ever want to look at just one day. The days are used for calculating a schedule and assigning due dates, so I can't imagine, in the context of this application, ever wanting to say "give me all the people who are working on Tuesday". But I can readily imagine the same data in a different application being used with precisely that question.

Jay