views:

702

answers:

10

I am setting up a table where I need the year and month. In MySQL I believe I have 2 options: (1) 2 fields: 1 for year, 1 for month or (2) a date field (the day would always be 1).

The 2 fields has the advantage of being some what faster (I think) because MySQL doesn't have to convert the value from a date to an integer although this is probably negligible. The date field has the advantage of "automatic" validation: someone can't get data into the db with the the month being 13 or the year being 1. With a date field you can also do date calculations more easily (ie, months between).

Which would you use? Or is there another you would use?

+11  A: 

Use a date field. Since sql supports date fields natively, its easy to filter for specific dates by using the WHERE clause.

The 2 fields has the advantage of being some what faster [...]

Your SELECT query is not your bottleneck so you shouldn't worry about this. Readability and a pragmatic program is more important than a "perceived bottleneck".

MrValdez
I agree with most but if the columns are named year and month , where is the readability issue?
Learning
SELECT [...] WHERE date between '02-01-2009' AND '04-31-2010' vs SELECT [...] WHERE Year BETWEEN 2009 AND 2010 AND Month BETWEEN 2 AND 4
MrValdez
You think that the latter has readability issues?
Learning
.. don't answer that. I guess it's subjective.
Learning
Those select statements are not identical. The second one only grabs months 2,3, and 4 for years 2009 and 2010; which is nowhere near the same thing as grabbing 14 months worth of data..
Chris Lively
@Chris I stand corrected. On second inspection, my SQL statement is flawed.
MrValdez
@But your point stands. Changing it to use MONTH(date) BETWEEN 2 AND 4 AND YEAR(date) BETWEEN 2009 and 2010 would make it equivalent. However getting the split mode to return a range is a bit trickier.
toast
+1 ... how about trying to find the records where the date range spans multiple years, using two columns? You'd probably do it by combining the two columns into one. Much more simple with a single date column, and easier to get the number of days between two values also.
David Aldridge
When I first read your answer, I disagreed, but you inadvertently proved your point by showing how easy it is to screw up the SQL when you're using separate month and year columns. :)
Kyralessa
+1  A: 

If you are going to run a lot of operations on the date field then I'd rip it apart into separate columns and deal with the data validation either in a table constraint or in the DAL.

For example, building sales reports by day, month, year are much more efficient when the fields are split. The reason being that you don't have to use datetime functions to rip apart the date for grouping.

If it's something like a birthday where I might query on it once in a while then I wouldn't worry about it and just leave it in a date field.

Chris Lively
+1  A: 

I would use the date field even if you only need the year and month you don't lose anything by gathering all the data. As a standard practice i always gather all data when ever possible.

Dedrick
A: 

Probably not because the smallest datetime data type in SQL Server (Microsoft) is smalldatetime which is 4 bytes long. If you need just month and year then you need 1 byte for month and 2 bytes for year.

David Pokluda
Probably not? I was an either or :P
Darryl Hein
+1  A: 

I would use separate columns, mainly because that would allow for better use of indexes. E.g., I don't think an index on a datetime col will help if you are just concerned with data from a given month (not year).

RedFilter
Unless there's something "special" about mysql it would. Or I should say that it "could".
David Aldridge
A: 

Although not immediately of use to you, IBM Informix Dynamic Server supports the type:

DATETIME YEAR TO MONTH

This stores exactly what you want - the year and month. It has its uses. The DATETIME family of types includes many other types that occasionally have their uses - and some that are of marginal utility, the canonical example being DATETIME MONTH TO MINUTE. (The downside of the type is the verbose notations needed to manipulate it, but there are many operations that can be done on any or all of the DATETIME types.)

In many DBMS, you can place constraints on columns, so if you go with a two-column approach, you would place a CHECK(month_column BETWEEN 1 AND 12) constraint on the column to ensure that the user did not place an invalid value in the table. You might even apply a constraint on the year column too.

Also, some DBMS allow you to create user-defined types, and a year-month type is pretty straight-forward as these go. The details depend on the DBMS, of course.

Jonathan Leffler
+1  A: 

Unless there is a specific performance benefit of storing the year and month seperately, I would stick with the date. Regarding indexing, if have you two columns, you will need to create an index on the combination of columns rather than one for the date column. The date will be converted internally into a long value so the storage space required is not really an issue.

Additionally, think of the possible maintenance pain with two fields. You would have two db fields, possibly two fields on an object or the need to build/parse month and year to/from the db. Keep it simple with a date and let the DB keep track of your data integrity.

I work with data like you described - expiration dates where day is always last day of the month so we only need month and year. We store these as a date.

aurealus
+1  A: 

I'd keep a datetime column and two computed columns with month and year (indexed of course). Have my cake and eat it too :)

Learning
+1  A: 

If you anticipate queries of the form "gimme all rows in July, regardless of the year," they'll be a little easier to write with separate month and year columns. A separate index for the month column should make it snappy.

Otherwise, I'd go for the single date column: simple, understood, built-in validation, and date math functions work. Your only worry is that someone new to the design will wonder why everything always occurs on the first of the month.

There is one other reason to use separate month and year columns that I've run into: when the month is not known. I've used that for apps that allow an upcoming event to be "sometime in 2009." In that case, using a NULL in the month column solves the problem nicely. There's no easy way to do that with a date-type column unless you come up with some horrible hack like the 2nd of January means the month is unknown.

yukondude
+1  A: 

Think about it this way: One day someone will come to you with a requirement to enhance the application with the ability to not only save year and month, but also a day. Would you then add an extra column for a day? And then, next thing, they might want you to also save time.

How easy would it be to enhance the functionality if you have separate columns for year/month/day? If you have a single date column?

I would go for a date column for this reason alone.

Evgeny