views:

507

answers:

7

I've got to validate numerous dates with my current project. Unfortunately, these dates can vary wildly. Examples include:

  1. 1983-07-10 (After 1970)
  2. 1492-10-11 (Before 1970, year of Unix Timestamps - this eliminates strtotime() on some systems)
  3. 200 B.C. (Really old...)

Dates will not exceed 9999 b.c., nor will they be future (beyond 'today'). What would be the best way to validate that the values submitted are indeed dates, and proper dates at that?

Updates...

All dates must be sortable within their global list. Meaning dates 1 and 3 above must be comparable to one-another, and sorted ASC or DESC.

I'm fully aware of the calendar-changes that have taken place in the past, and the confusion around these changes. My project assumes the user has already performed the proper calibration to find out the date according to our modern-calendar system. I won't be performing this calibration for them.

A: 

what about strtotime() ?

dusoft
I don't think you read the whole question.
Tim Sylvester
i did. strtotime can validate a part of the date formats he expects. use that in connection with some regular expression and you are set.
dusoft
yeah, go on, downvote me, your ego can't accept multiple opinions.
dusoft
@dusoft You don't need to insult anybody. I down-voted your answer because it suggested the very thing I stated wouldn't suffice in my question.
Jonathan Sampson
Not my downvote, but you should qualify your answer with the information in the comment. As readers can not guess your implementation of the solution from the verbiage you provided. I believe they are down voting an incomplete answer, not a wrong one.
Matthew Vines
+5  A: 

How about a series of carefully-written regular expressions that recognize each possible format. Once you know the format, you can validate and perhaps put it into a uniform representation (e.g., 64-bit time_t).

e.g.,

/(\d{4})-(\d{2})-(\d{2})/
/(\d+)(bc|b.c.|bce|b.c.e)/i
etc.

Since it sounds like each form has its own validation rules, and you're not implementing any widely-available standard, I think you're stuck validating each case separately.

Update:

All dates must be sortable within their global list.

It seems to me that in order to be able to sort dates that appear in different formats you would need a uniform representation for each one internally, as I mentioned before. For example, use a multi-key dictionary (std::multimap in C++, not sure about PHP) to store (uniform representation)->(input representation) mappings. Depending on the implementation of the container, you may get reverse lookups or key ordering for free.

Tim Sylvester
A: 

What is most important, I think, is to list all the posibillities ( or group them somehow ) and prepare regular expressions for every option - and on that basis identify and handle it.

shazarre
+1  A: 

You may consider implementing your own custom DateTime type class. I'm not sure what all your requirements are, but I could see it having properties for BC/AD, formatting, etc. With a little thought it shouldn't be much harder than implementing a Money type class if that is familiar for you.

The reason I suggest this is that 200 BC and 1492-10-07 are vastly different, even format wise. Speaking off the cuff, if you treat BC < 0 < AD you may be able to get the calculations out of it you need as well.

Matthew Vines
+2  A: 

What about using Zend_Date. Zend's date library is a very good date utility library. It can work standalone or with other Zend Libraries and can work with date_default_timezone_set() so dates are automatically parsed for the set timezone and it will work for dates outside of the Unix timestamp range. It can be a little long-winded to write sometimes, but it's strengths greatly outweigh its weaknesses.

You may have to implement your own custom parsing for BC/AD as I'm not sure it would work for that, but it might be worth a try.

Pear also has a date library that might be worth looking at, however, I haven't used it and have heard from a lot of people that they prefer Zend_Date to Pear's Date package.

You could always write your own, but why re-invent the wheel. If it doesn't roll the way you want, take it and improve upon it ;)

Tres
A: 

Since you are in control of the input interface, without loss of generality we can assume that there will be separate year/month/day integers (properly validate for... being integer :). Let's say that year will be negative to indicate BC.

So first of all... the obvious (partial) answer: checkdate(). This is just fine for years >= 1, as the function documentation says.

You 're therefore stuck with the problem of what to do if year <= 0.

Let's make a side-trek here and see why that might be a BIG problem...

According to the Wikipedia link above, the Julian calendar came into effect in 45 BC. This calendar is, for all practical purposes, identical to the Gregorian calendar we use today. The difference is that there is a ten-day offset between them; the last day of the Julian calendar was Thursday, 4 October 1582 and this was followed by the first day of the Gregorian calendar, Friday, 15 October 1582 (the cycle of weekdays was not affected).

This already means that dates in the range 5 Oct 1582 to 14 Oct 1582 (inclusive) are invalid if you are following the Gregorian calendar; they have never existed.

Going backward from there, you 're good until 45 BC. From 46 BC backwards, the Roman calendar was used instead of the Julian.

I 'm not going to go into that mess here, but simply mention that since that calendar was quite different from the Gregorian, your users will not be prepared to see a "Roman calendar date input form". My suggestion is, better make your app usable than technically correct.

If it can be assumed that nobody in their right mind would actually know a BC date to the day, or know how to properly specify it even if they did, you might arbitrarily assume that all dates BC are of the form 1/1/YEAR. Your interface might therefore disable the month/day controls if a "BC" checkbox was checked, have separate group boxes for BC and AD, or anything else appropriate.

The only remaining problem after all this, as I see it, is checking dates for leap years. Those were introduced with the Julian calendar, but not actually implemented correctly until 8 AD.

The last link above documents that during 45 BC - 4 AD (inclusive) leap years were not calculated correctly. A is-year-leap function that accounts for that inconsistency, plus the julian/gregorian switch would be:

define('YEAR_JULIAN_CALENDAR_INTRODUCED', -45);
define('YEAR_JULIAN_CALENDAR_LEAP_IMPLEMENTED_CORRECTLY', 8);
define('YEAR_GREGORIAN_CALENDAR_INTRODUCED', 1582);

function is_leap_year($year) {
    if($year < YEAR_JULIAN_CALENDAR_INTRODUCED) {
        return false; // or good luck :)
    }
    if($year < YEAR_JULIAN_CALENDAR_LEAP_IMPLEMENTED_CORRECTLY) {
        return $year <= -9 && $year % 3 == 0;
    }
    if($year < YEAR_GREGORIAN_CALENDAR_INTRODUCED) {
        return $year % 4 == 0;
    }
    // Otherwise, Gregorian is in effect
    return $year % 4 == 0 && ($year % 100 != 0 || $year % 400 == 0);
}

Armed with this, you could then write a function that correctly tells you how many days there are in each year. Date subtraction/addition could then be built on that.

After all this discussion (I do admire the courage of anyone who has read this far :) I have to ask:

How much accuracy do you actually need?

If you decide that you need to be anal about the "technical details", I would personally implement the functions mentioned above, and then: a) Use them as my handcrafted date library, or b) Use them to check that any 3-rd party library I 'm interested in is actually implemented correctly.

If you don't need to do that, just pretend you never read all this. :)

Jon
This is essentially the content of a long discussion I had with another person involved in the development of this application. In the end, we decided to let the users calibrate their own dates :)
Jonathan Sampson
A: 

Second answer, after Jonathan's question update:

For straightforward date comparison, you 'd need to use something integer-like, or a class library that supports dates back to 9999 BC (I don't know of one).

You could simply specify times as the number of seconds since 1/1/10000 BC (roll your own epoch); 64 bits would be more than enough for that. To do that, you need to solve one or two problems.

A. How to do 64-bit ints in PHP.

PHP is guaranteed to provide 31 bits for integers. You could therefore do one of the following:

  1. Write your own 62-bit-integer class, which stores the bits in two private integer members. 62 bits are also more than enough.

    This would be painful, and probably fast. Major advantage: you would not be dependent on any PHP extension.

  2. Use BCMath or GMP to do arbitrary-precision integers.

    I 'd try this first, if portability isn't a must. It could prove to be slower than acceptable, though. Major advantage: you don't risk getting the bit-fiddling code wrong.

With the 60-or-so-bit-integer class in hand (supporting addition/subtraction/comparison through corresponding methods or helper functions), you can then write a CustomDateTime class which supports all your required logic. This class would include all the "date-to-int" and vice versa code (e.g. construction); all the having-purely-to-do-the-int-implementation operations (e.g. comparison) would be simply forwarded to your integer class.

B. How to do 64-bit ints in the database.

All databases do that without problems. You almost surely need to go this route though, because e.g. MySQL doesn't support dates before 1000 AD. Don't know about other vendors.

Jon