views:

1468

answers:

4

I'd love to know if there is a module to parse "human formatted" dates in Perl. I mean things like "tomorrow", "Tuesday", "next week", "1 hour ago".

My research with CPAN suggest that there is no such module, so how would you go about creating one? NLP is way over the top for this.

+16  A: 

Date::Manip does exactly this.

Here is an example program:

#!/usr/bin/perl

use strict;
use Date::Manip;

while (<DATA>)
{
  chomp;
  print UnixDate($_, "%Y-%m-%d %H:%M:%S"),  " ($_)\n";
}

__DATA__
today
yesterday
tomorrow
last Tuesday
next Tuesday
1 hour ago
next week

Which results in the following output:

2008-11-17 15:21:04 (today)
2008-11-16 15:21:04 (yesterday)
2008-11-18 15:21:04 (tomorrow)
2008-11-11 00:00:00 (last Tuesday)
2008-11-18 00:00:00 (next Tuesday)
2008-11-17 14:21:04 (1 hour ago)
2008-11-24 00:00:00 (next week)

UnixDate is one of the functions provided by Date::Manip, the first argument is a date/time in any format that the module supports, the second argument describes how to format the date/time. There are other functions that just parse these "human" dates, without formatting them, to be used in delta calculations, etc.

Robert Gamble
Ah, good old Date::Manip... How can you not love a module that tries so hard to talk you out of using it?
Dave Sherohman
Exactly what I was looking for, but (as usual) I didn't know how to phrase the question. Thanks.
andymurd
draegtun
A: 

I assume you have context. how could NLP help here ? as a wild guess you could just find the nearest date that is an exact date(not relative to today) and use today/tommorow/yesterday to relate to that.

xxxxxxx
+8  A: 

you may also find it interesting to look at the DateTime::Format family, specifically DateTime::Format::Natural. once you've parsed your date/time into a DateTime object, you can manipulate and evaluate it in a whole bunch of different ways.

here's a sample program:

use strict;
use warnings;

use DateTime::Format::Natural;

my( $parser ) = DateTime::Format::Natural->new;

while ( <> ) {

    chomp;
    my( $dt ) = $parser->parse_datetime( $_ );

    if ( $parser->success ) {

        print join( ' ', $dt->ymd, $dt->hms ) . "\n";
    }
    else {

        print $parser->error . "\n";
    }
}

output:

tomorrow  
2008-11-18 21:48:49  
next Tuesday  
2008-11-25 21:48:53  
1 week from now  
2008-11-24 21:48:57  
1 hour ago  
2008-11-17 20:48:59

TMTOWTDI :)

-steve

hakamadare
Thanks for suggesting DateTime. Yay, One True Systems!
Anirvan
+2  A: 

Personally, I've always used Time::ParseDate for this. It understands pretty much every format I've tried.

Absolute date formats

    Dow, dd Mon yy
    Dow, dd Mon yyyy
    Dow, dd Mon
    dd Mon yy
    dd Mon yyyy
    Month day{st,nd,rd,th}, year
    Month day{st,nd,rd,th}
    Mon dd yyyy
    yyyy/mm/dd
    yyyy-mm-dd      (usually the best date specification syntax)
    yyyy/mm
    mm/dd/yy
    mm/dd/yyyy
    mm/yy
    yy/mm      (only if year > 12, or > 31 if UK)
    yy/mm/dd   (only if year > 12 and day < 32, or year > 31 if UK)
    dd/mm/yy   (only if UK, or an invalid mm/dd/yy or yy/mm/dd)
    dd/mm/yyyy (only if UK, or an invalid mm/dd/yyyy)
    dd/mm      (only if UK, or an invalid mm/dd)

Relative date formats:

    count "days"
    count "weeks"
    count "months"
    count "years"
    Dow "after next"
    Dow "before last"
    Dow                     (requires PREFER_PAST or PREFER_FUTURE)
    "next" Dow
    "tomorrow"
    "today"
    "yesterday"
    "last" dow
    "last week"
    "now"
    "now" "+" count units
    "now" "-" count units
    "+" count units         
    "-" count units
    count units "ago"

Absolute time formats:

    hh:mm:ss[.ddd] 
    hh:mm 
    hh:mm[AP]M
    hh[AP]M
    hhmmss[[AP]M] 
    "noon"
    "midnight"

Relative time formats:

    count "minutes"         (count can be franctional "1.5" or "1 1/2")
    count "seconds"
    count "hours"
    "+" count units
    "+" count
    "-" count units
    "-" count
    count units "ago"

Timezone formats:

    [+-]dddd
    GMT[+-]d+
    [+-]dddd (TZN)
    TZN

Special formats:

    [ d]d/Mon/yyyy:hh:mm:ss [[+-]dddd]
    yy/mm/dd.hh:mm
cjm