views:

1859

answers:

10

Java & Oracle both have a timestamp type called Date. Developers tend to manipulate these as if they were calendar dates, which I've seen cause nasty one-off bugs.

  1. For a basic date quantity you can simply chop off the time portion upon input, i.e., reduce the precision. But if you do that with a date range, (e.g.: 9/29-9/30), the difference between these two values is 1 day, rather than 2. Also, range comparisons require either 1) a truncate operation: start < trunc(now) <= end, or 2) arithmetic: start < now < (end + 24hrs). Not horrible, but not DRY.

  2. An alternative is to use true timestamps: 9/29 00:00:00 - 10/1 00:00:00. (midnight-to-midnight, so does not include any part of Oct). Now durations are intrinsically correct, and range comparisons are simpler: start <= now < end. Certainly cleaner for internal processing, however end dates do need to be converted upon initial input (+1), and for output (-1), presuming a calendar date metaphor at the user level.

How do you handle date ranges on your project? Are there other alternatives? I am particularly interested in how you handle this on both the Java and the Oracle sides of the equation.

A: 

Based upon your first sentence, you're stumbling upon one of the hidden "features" (i.e. bugs) of Java: java.util.Date should have been immutable but it ain't. (Java 7 promises to fix this with a new date/time API.) Almost every enterprise app counts on various temporal patterns, and at some point you will need to do arithmetic on date and time.

Ideally, you could use Joda time, which is used by Google Calendar. If you can't do this, I guess an API that consists of a wrapper around java.util.Date with computational methods similar to Grails/Rails, and of a range of your wrapper (i.e. an ordered pair indicating the start and end of a time period) will be sufficient.

On my current project (an HR timekeeping application) we try to normalize all our Dates to the same timezone for both Oracle and Java. Fortunately, our localization requirements are lightweight (= 1 timezone is enough). When a persistent object doesn't need finer precision than a day, we use the timestamp as of midnight. I would go further and insist upon throwing away the extra milli-seconds to the coarsest granularity that a persistent object can tolerate (it will make your processing simpler).

Alan
Chris Noe
Wrapping Date provides better control when standardized across the project, but then there's always some pesky logic down in the database.
Chris Noe
+2  A: 

I use Oracle's date data type and educate developers on the issue of time components affecting boundary conditions.

A database constraint will also prevent the accidental specification of a time component in a column that should have none and also tells the optimizer that none of the values have a time component.

For example, the constraint CHECK (MY_DATE=TRUNC(MY_DATE)) prevents a value with a time other than 00:00:00 being placed into the my_date column, and also allows Oracle to infer that a predicate such as MY_DATE = TO_DATE('2008-09-12 15:00:00') will never be true, and hence no rows will be returned from the table because it can be expanded to:

MY_DATE = TO_DATE('2008-09-12 15:00:00') AND
TO_DATE('2008-09-12 15:00:00') = TRUNC(TO_DATE('2008-09-12 15:00:00'))

This is automatically false of course.

Although it is sometimes tempting to store dates as numbers such as 20080915 this can cause query optimization problems. For example, how many legal values are there between 20,071,231 and 20,070,101? How about between the dates 31-Dec-2007 abnd 01-Jan-2008? It also allows illegal values to be entered, such as 20070100.

So, if you have dates without time components then defining a range becomes easy:

select ...
from   ...
where  my_date Between date '2008-01-01' and date '2008-01-05'

When there is a time component you can do one of the following:

select ...
from   ...
where  my_date >= date '2008-01-01' and
       my_date  < date '2008-01-06'

or

select ...
from   ...
where  my_date Between date '2008-01-01'
                   and date '2008-01-05'-(1/24/60/60)

Note the use of (1/24/60/60) instead of a magic number. It's pretty common in Oracle to perform date arithmetic by adding defined fractions of a day ... 3/24 for three hours, 27/24/60 for 27 minutes. Oracle math of this type is exact and doesn't suffer rounding errors, so:

select 27/24/60 from dual;

... gives 0.01875, not 0.01874999999999 or whatever.

David Aldridge
Good to know about this type of constraint. How do you apply this to date ranges?
Chris Noe
Not sure I follow you Chris ... got an example situation?
David Aldridge
Per examples I gave in the question.
Chris Noe
Do you mean how do you apply theconstraints to the problem of applying date ranges in the query?
David Aldridge
Yes. Your constraint concept is helpful. I'm just curious which way you use the truncated "timestamps" to represent date ranges. One of the ways I described, or another?
Chris Noe
Does that edit help, Chris?
David Aldridge
Okay, so you are in the calendar date camp (#1). Thanks for all your example code. Looks like BETWEEN makes that painless on the Oracle side.
Chris Noe
A: 

Based on my experiences, there are four main ways to do it:

1) Convert the date to an epoch integer (seconds since 1st Jan 1970) and store it in the database as an integer.

2) Convert the date to a YYYYMMDDHHMMSS integer and store it in the database as an integer.

3) Store it as a date

4) Store it as a string

I've always stuck with 1 and 2, because it enables you to perform quick and simple arithmetic with the date and not rely on the underlying database functionality.

Steve M
Not sure I understand #2 conversion.
Chris Noe
+4  A: 

Oracle has the TIMESTAMP datatype. It stores the year, month, and day of the DATE datatype, plus hour, minute, second and fractional second values.

Here is a thread on asktom.oracle.com about date arithmetic.

EddieAwad
Hmm, how is that different from their Date type?
Chris Noe
I missed the fractional second value. I updated my answer.
EddieAwad
So, per asktom you like the #1 type of approach.
Chris Noe
A: 

Im storing all dates in milliseconds. I do not use timestamps/datetime fields at all.

So, i have to manipulate it as longs. It means i do not use 'before', 'after', 'now' keywords in my sql queries.

Vugluskr
Any valuable lessons learned for how this plays out in manipulating date ranges?
Chris Noe
A: 

All dates can be unambiguously stored as GMT timestamps (i.e. no timezone or daylight saving headaches) by storing the result of getTime() as a long integer.

In cases where day, week, month, etc. manipulations are needed in database queries, and when query performance is paramount, the timestamps (normalized to a higher granularity than milliseconds) can be linked to a date breakdown table that has columns for the day, week, month, etc. values so that costly date/time functions don't have to be used in queries.

Ates Goral
Any thoughts on date range manipulation in this regard?
Chris Noe
Date ranges have to be dealt with on the Java side, prior to generating the DB query which uses GMT timestamps. The day/week/month components are only used for grouping/summarization etc. by different time granularities.
Ates Goral
A: 

Alan is right- Joda time is great. java.util.Date and Calendar are just a shame.

If you need timestamps use the oracle date type with the time, name the column with some kind of suffix like _tmst. When you read the data into java get it into a joda time DateTime object. to make sure the timezone is right consider that there are specific data types in oracle that will store the timestamps with the timezone. Or you can create another column in the table to store the timezone ID. Values for the timezone ID should be standard full name ID for Timezones see http://java.sun.com/j2se/1.4.2/docs/api/java/util/TimeZone.html#getTimeZone%28java.lang.String%29 . If you use another column for the TZ dta then when you read the data into java use DateTime object but set the timezone on the DateTime object using the .withZoneRetainFields to set the timezone.

If you only need the date data (no timestamp) then use the date type in the database with no time. again name it well. in this case use DateMidnight object from jodatime.

bottom line: leverage the type system of the database and the language you are using. Learn them and reap the benefits of having expressive api and language syntax to deal with your problem.

Kyle Dyer
Any thoughts on date range manipulation in this regard?
Chris Noe
+4  A: 

Here's how we do it.

  1. Use timestamps.

  2. Use Half-open intervals for comparison: start <= now < end.

Ignore the whiners who insist that BETWEEN is somehow essential to successful SQL.

With this a series of date ranges is really easy to audit. The database value for 9/30 to 10/1 encompass one day (9/30). The next interval's start must equal the previous interval's end. That interval[n-1].end == interval[n].start rule is handy for audit.

When you display, if you want, you can display the formatted start and end-1. Turns out, you can educate people to understand that the "end" is actually the first day the rule is no longer true. So "9/30 to 10/1" means "valid starting 9/30, no longer valid starting 10/1".

S.Lott
Thanks. First response that directly addresses the question. (Big concur on the BETWEEN whiners.)
Chris Noe
@Chris Noe: Date time calculations are hard. There's a lot of hubris in knowing weird details about fractional seconds (or their lack) in various SQL time representations. I prefer the half-open comparison -- it's hard to get wrong.
S.Lott
+1  A: 

I don't see the Interval datatypes posted yet.

Oracle also has datatypes for your exact scenario. There are INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND datatypes in Oracle as well.

From the 10gR2 docs.

INTERVAL YEAR TO MONTH stores a period of time using the YEAR and MONTH datetime fields. This datatype is useful for representing the difference between two datetime values when only the year and month values are significant.

INTERVAL YEAR [(year_precision)] TO MONTH

where year_precision is the number of digits in the YEAR datetime field. The default value of year_precision is 2.

INTERVAL DAY TO SECOND Datatype

INTERVAL DAY TO SECOND stores a period of time in terms of days, hours, minutes, and seconds. This datatype is useful for representing the precise difference between two datetime values.

Specify this datatype as follows:

INTERVAL DAY [(day_precision)] TO SECOND [(fractional_seconds_precision)]

where

day_precision is the number of digits in the DAY datetime field. Accepted values are 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the SECOND datetime field. Accepted values are 0 to 9. The default is 6.

You have a great deal of flexibility when specifying interval values as literals. Please refer to "Interval Literals" for detailed information on specify interval values as literals. Also see "Datetime and Interval Examples" for an example using intervals.

Ah. I wasn't aware of INTERVAL, (apparently introduced in Ora 9i). Of course the architecture question then is how these map to the Java layer.
Chris Noe
Yeh, I don't know much about Java. Sorry... but other than that, this is a pretty cool feature that I've never had a use for. I was psyched to get to mention it.
+1  A: 

I second what S.Lott explained. We have a product suite which makes extensive use of date time ranges and it has been one of our lessons-learned to work with ranges like that. By the way, we call the end date exclusive end date if it is not part of the range anymore (IOW, a half open interval). In contrast, it is an inclusive end date if it counts as part of the range which only makes sense if there is no time portion.

Users typically expect input/output of inclusive date ranges. At any rate, convert user input as soon as possible to exclusive end date ranges, and convert any date range as late as possible when it has to be shown to the user.

On the database, always store exclusive end date ranges. If there is legacy data with inclusive end date ranges, either migrate them on the DB if possible or convert to exclusive end date range as soon as possible when the data is read.

deepc