views:

141

answers:

1

Hello!

I'm planning a distributed system of applications that will communicate with different types of RDBMS. One of the requirements is consistent handling of DateTimes across all RDBMS types. All DateTime values must be at millisecond precision, include the TimeZone info and be stored in a single column.

Since different RDBMS's handle dates and times differently, I'm worried I can't rely on their native column types in this case and so I'll have to come up with a different solution. (If I'm wrong here, you're welcome to show me the way.)

The solution, whatever it may be, should ideally allow for easy sorting and comparisons on the SQL level. Other aspects, such as readability and ability to use SQL datetime functions, are not important, since this will all be handled by a gateway service.

I'm toying with an idea of storing my DateTime values in an unsigned largeint column type (8 bytes). I haven't made sure if all RDBMS's in question (MSSQL, Oracle, DB2, PostgreSQL, MySQL, maybe a few others) actually /have/ such a type, but at this point I just assume they do.

As for the storage format... For example, 2009-01-01T12:00:00.999+01:00 could be stored similar to ?20090101120000999??, which falls in under 8 bytes.

The minimum DateTime I'd be able to store this way would be 0001-01-01T00:00:00.000+xx:xx, and the maximum would be 8000-12-31T23:59:59.999+xx:xx, which gives me more than enough of a span.

Since maximum unsigned largeint value is 18446744073709551615, this leaves me with the following 3 digits (marked by A and BB) to store the TimeZone info: AxxxxxxxxxxxxxxxxxBB.

Taking into account the maximum year span of 0001..8000, A can be either 0 or 1, and BB can be anywhere from 00 to 99.

And now the questions:

  • What do you think about my proposed solution? Does it have merit or is it just plain stupid?

  • If no better way exists, how do you propose the three remaining digits be used for TimeZone info best?

Thank you very much for your help in advance!

+1  A: 

I would suggest you to store the datetime information in milliseconds since 1970 (Java style) . It's a standard way for storing datetime information, in addition it's more efficient in terms of space than your suggestion. Because in your suggestion some digits are "wasted" i.e. the month digits can store only 00-12 (instead of 00-99) and so on. You didn't specify what is your development language but I am sure you can find many code snippets that transform date to milliseconds. If you are developing in .NET they have a similar concept of ticks. (you can use this information as well)

Regarding the time zone,I would have add another column to store only the TimeZone indication.

Remember that any format you choose should maintain consistency between two dates, i.e. if D1 > D2 then format(D1)>format(D2) , this way you can query the DB for changes since some date, or query for changes between two dates

LiorH
Num. of milliseconds since 1970 translates to using 6 bytes for current DateTime values, which is exactly the same in efficiency as my proposed 8 bytes, regardless of wasted digits.About your last paragraph on consistency and querying... Do you believe my proposed solution doesn't cover those? Thx
aoven
How do 6 bytes the same as 8 bytes. Regarding the your question I didn't fully understand what does the first digit used for? Because it will determine the comparison result between 2 dates.
LiorH
Well, I know of no numeric column type that would take 6 bytes. It's either 4 (int) or 8 (largeint), nothing in between. So if something takes 5 or 6 bytes, you need to go with largeint anyway.As for the leading digit: you make a good point. I guess this rules it out automatically.
aoven