views:

404

answers:

7

We have to save the day and the month of a yearly reoccurring event. (Think of it as the day of they year when the invoice has to be sent to the client.)

We will have to do some computation with this information. E.g if the customer is paying twice per year. Then we do not only have to check if this day is today but also if this day is in 6 month.

We have now several options: a) We save the information in a DATE field but ignore the year. b) We create two int fields in the database date_day and date_month. c) We create a varchar field and then do a regex and split e.g. 31.12 up every time we do some calculation.

We did several tests and found out that c) is definitely too slow. But we still have the option a) and b). First we wanted to go for b) but after some tests we tend more to a).

Is there a good technical reason that would really make one option a lot better than the other?

(We are using MySQL at the moment. If this is of importance.)

+4  A: 

I'd choose b), because it more accurately reflects the meaning of the data. Having a data structure where some parts are supposed to be "ignored" is problematic. What if people just do a simple date comparison, assuming the year is always the same, but someone used a different placeholder year, assuming the year doesn't matter anyway?

Always use data structures that reflect your intentions.

That c) is bad is, I believe, beyond reasonable discussion :-). And I'm not thinking of performance reasons...

sleske
Agreed, having a data field where parts of it are supposed to be ignored are very bad and will most likely bite you in the future. I vote for option b.
DoctaJonez
Yes, the year doesn't matter. We already have a contract start date and a contract end date. We just need to calculate if a invoice has to be sent. And the first selection we are doing is to check if "today" is within the start date and the end date. Only then we check if we have to generate an invoice today.
Raffael Luthiger
A: 

I'd also side with b), but use TINYINT for month (0 to 255) and SMALLINT (-32,768 to 32,767) for year to save a bit of space.

Peter Bridger
A: 

Choose ONE int field like 1601 for January, 1st.

Kai
What's the significant difference between this option and "31.12'?
Adam Matan
calculating instead of string manipulation
Kai
Correct, but you could also store it as a real number.
Adam Matan
A: 

I'd choose b), because it would make queries much easier: you'll be able to restore all events in a range (December, specific day, day range) in a very easy manner. If you choose a) - don't forget to set the year to a specific one for comparison and extraction reasons.

Adam Matan
+3  A: 

I would use the date field and even though not needed, would still save the year as well. Just strip it out when you have to print it / use it. There are a few reasons:

  1. You might find out that a later point the customer does want you to save the date. In that case you don't have to do any changes to your database structure.
  2. You can use the SQL date functions to compare dates, if needed. If you have day and month in separate fields, you need a lot more code to e.g. calculate the difference between two dates (leap years etc).

The reasons give for choosing b) can also be easily solved with those SQL date functions. You can easily pick events in a certain month, for instance, in a single query.

Makis
Thanks for reminding me on the leap years. This made me realize that we forgot to treat the February differently. Maybe we even need something like "last day of month" as an option for the users. This comment helped me a lot. Even if we would decide to still go for b)
Raffael Luthiger
+1  A: 

I'd store the date of the first event, and then an interval for each subsequent event, kinda like most calendar apps. In this case, you'd structure it like this:

 first_event | interval | interval_unit
-------------+--------------------------
  2009-01-01 |        6 | 'month'
  2009-02-01 |        1 | 'year'

Unfortunately, MySQL doesn't have an INTERVAL datatype, so two columns and a bit of post-processing will be necessary, but I think it's the most flexible way to approach the problem.

Samir Talwar
We do have the interval and the interval_unit already. The question is more about how to save the first_event if the year is not "important". What I would like to know is if a date field is really the best choice for the post-processing. (Maybe my answer to "sleske" gives you more information.)
Raffael Luthiger
I think it will make calculations much easier if you do store the first event as a DATE. If you were always going to have intervals of one year, having just a MONTH and DAY column would make sense, but as your intervals may change (what if you want to bill someone every month?), I'd go with a calendar approach.
Samir Talwar
A: 

I'd go for the day-of-year number (e.g. a single number from 0 to 365, and then add that to the 1st of january of the particular year you are interested in.

If you don't want to do the extra math of the above solution, then use two fields one for month and one for day (but make sure you update both when you need to!).

Remember, you have to deal with leap years, so using a date field is a bad idea since you'd have to store dates with two years - one a leap year and one not - very complicated!

Lee Atkinson
Leap years are precisely why 'day of year' wouldn't work for this - if you stored '60' then in leap years you would pay on Feb 29, in non-leap years on Match 1 - real-world payment schedules don't do that.
AakashM
True, though it doesn depend on ones business rules. I think I would shift the day of year number back an dforth for days after the 28th to correct this - e
Lee Atkinson