views:

79

answers:

3

I've been hammering my head against my desk for the past few days on this, and so I turn to you, Stack Overflow.

The software I'm working on has time-sensitive data. The usual solution for this is effective and expiration dates.

EFF_DT      XPIR_DT     VALUE
2000-05-01  2000-10-31  100
2000-11-01  (null)      90

This would be easy. Unfortunately, we require data that repeats on a yearly basis arbitrarily far into the future. In other words, each May 1 (starting in 2000) we may want the effective value to be 100, and each November 1 we may want to change it to 90.

This may be true for a long time (>50 years), and so I don't want to just create a hundred records. I.e., I don't want to do this:

EFF_DT      XPIR_DT     VALUE
2000-05-01  2000-10-31  100
2000-11-01  2001-04-30  90
2001-05-01  2001-10-31  100
2001-11-01  2002-04-30  90
2002-05-01  2002-10-31  100
2002-11-01  2003-04-30  90
...
2049-05-01  2049-10-31  100
2049-11-01  2050-04-30  90
2050-05-01  2050-10-31  100
2050-11-01  2051-04-30  90

These values may also change with time. Values before 2000 might have been constant (no flip-flopping) and values for the coming decade may be different than the values for the last:

EFF_DT      XPIR_DT     REPEATABLE  VALUE
1995-01-01  2000-04-30  false       85
2000-05-01  2010-04-30  true        100
2000-11-01  2010-10-31  true        90
2010-05-01  (null)      true        120
2010-11-01  (null)      true        115

We already have a text file (from a legacy app) that stores data in a form very close to this, so there are benefits to adhering to this type of structure as closely as possible.

The question then comes on retrieval: which value would apply to today, 2010-03-09?

It seems that the best way to do this would be to find the most recent instance of each effective date (of all the active rows), then see which is the greatest.

EFF_DT      MOST_RECENT XPIR_DT     VALUE
2000-05-01  2009-05-01  2010-04-30  100
2000-11-01  2009-11-01  2010-10-31  90

The value for today would be 90, since 2009-11-01 is later than 2009-05-01.

On, say, 2007-06-20:

EFF_DT      MOST_RECENT XPIR_DT     VALUE
2000-05-01  2007-05-01  2010-04-30  100
2000-11-01  2006-11-01  2010-10-31  90

The value would be 100 since 2007-05-01 is later than 2006-11-01.

Using the MySQL date functions, what's the most efficient way to calculate the MOST_RECENT field?

Or, can anyone think of a better way to do this?

The language is Java, if it matters. Thanks all!

A: 

Here is a query that you can use to calculate the more recent EFF_DT for a data set. You will have to fill in there where clause because i'm not sure how this data is organized.

select EFF_DT  form  date_table where 1 order by EFF_DT desc limit 1

The flip flop of 90 and 100 is more complex, but you should be able to take care of this using the mysql data and time functions. This is a tricky one, and I'm not 100% on what you are trying to do. But, this query checks to see if the month of XPIR_DT is greater than May (the 5th month) but less than November (The 11th month). If this is true then the sql query will return 90, if its false then you'll get 100.

select if((month(XPIR_DT)>=5) and (month(XPIR_DT)<11),90,100) from date_table where id=1
Rook
+2  A: 

Suppose your wanted 'date' is '2007-06-20'.

You need to combine the non-repeating elements with the repeating ones, so you could do something like this (untested and probably needs some thinkering, but should give you the general idea):

select * from (
  select * from mytable 
  where 
    repeatable = false
    and 
    EFF_DT <= '2007-06-20' < XPIR_DT
  union all
  select * from mytable
  where
    repeatable = true
    and EFF_DT <= str_to_date(concat("2007", "-", month(EFF_DT), "-", day(EFF_DT)), "%Y-%m-%d") < XPIR_DT
)
order by EFF_DT desc limit 1
ChristopheD
Thanks... I'll put in some test data and give this a shot. I'll let you know how it turns out. :-)
Tenner
I was hoping to do this in one database hit -- retrieve all data, with the database supplying my Java code a "hint" as to which row to use. I think that I'm going to introduce all of your logic in the underlying Java code. There are a lot of smarts I need to put in that a MySQL one-liner just can't (or shouldn't) handle.Thanks for the help!
Tenner
+1  A: 

I've had to do similar things with recurring appointments & events, and you might find that MySQL will be a lot happier with the "static" date style that you don't want - each recurring instance spelled out in hundreds of rows.

If possible, I'd consider creating a separate table to store them flattened out, while keeping the effective/expires dates where they are (to match legacy data & act as a parent), and a 1:many relation between the two tables (i.e. an "event_id" on the flattened data referencing the original's PK). Writing all those records will obviously take longer, but it's directly lightening the load from reading them (where things generally need to be faster).

Creating a stored procedure or external program to handle recalculating a flat start_date / end_date / value table should be fairly basic, given a common interval. Querying the data could then be as simple as WHERE @somedate BETWEEN start_date AND end_date, instead of increasingly complex conversions & date math.

Again, INSERTs and UPDATEs will be slower, but "hundreds of rows" isn't even scratching the surface of what MySQL's capable of. If it's just 2 dates, an int, and some sort of int key, writing a few hundred records shouldn't take but a couple seconds on a sub-par server. If we were talking millions of records then maybe something could be tweaked (do you really need to track 50 years ahead or just the next 5? can recalculation be moved to off-peak times via cron? etc), but even then MySQL will just be that much more effective compared to calculating the difference every time.

Also maybe of interest: What's the best way to model recurring events in a calendar application? & Data structure for storing recurring events?

tadamson
Thanks for your input. One of the reasons I hesitate to expand out the data is that it may be necessary to edit these recurring data points. I.e., if value changes from 90 to 95, I'd like to change it in one place, not a hundred. And, the user interface needs to be able to "collapse" this data down to a reasonable form. (The user should see two rows, not 100.) Still, good food for thought here...
Tenner