tags:

views:

177

answers:

5

I have an array with n strings in format of YYYY-MM-DD (Example, "2010-10-31").

How do I compare a date to the strings in this array?

For example, delete the strings more than 30 day ago?

+2  A: 

A good start is to read The Many Dates of Perl and the DateTime site.

The YYYY-MM-DD format is a form of ISO 8601 date representation. There are variants of it that are considered acceptable, such as YYYY-MM-DD and YYYYMMDD and even YYMM in older data. You should look at a definitive list before you choose a method to compare these dates.

If ISO 8601 dates strings are: 1) valid dates; 2) in the same format with or without the - delimiter; 3) lacking in leading and trailing whitespace, an attractive property is that you can sort or compare the strings with simple lexicographical string comparisons.

In general then:

  1. IFF you aren't going to check if the dates are valid and IFF they are the same format, and IFF there is not leading or trailing whitespace, you can compare against another string representing the target date in that same format.

--- Otherwise ---

  1. Decide on a CPAN module to parse your date string (or match it yourself),

  2. Convert to epoch time if if your dates are in that range, (or use a CPAN module that does larger scale date / time manipulation like Date::Manip or Date::Calc)

  3. Perform the arithmetic on the type of time (epoch time, absolute days, whatever)

  4. Convert the time back into the format that you want...

Here is code that does that:

use warnings; use strict;
use Date::Calc qw/:all/;

my (@date_strings, @abs_days);

my $target=Date_to_Days(2010, 1, 15);

# set @date_string to "YYYY-MM-DAY" between some dates
for my $AbsDay(Date_to_Days(2009,1,1)..Date_to_Days(2011,12,31)) {
   my ($year, $mon, $day)=Add_Delta_Days(1,1,1,$AbsDay-1);
   my $s="$year-$mon-$day";
   push @date_strings, $s;
}

foreach my $s (@date_strings) {
    my ($year, $mon, $day);

    if(($year, $mon, $day)=$s=~/^(\d+)-(\d+)-(\d+)/) {
        my $days=Date_to_Days($year, $mon, $day);
        push @abs_days, $days 
             if ($target-$days <= 30 && $target-$days >= -30 );
    }
}

print "absolute day way:\n";
foreach my $days (@abs_days) {
    my ($year, $mon, $day)=Add_Delta_Days(1,1,1,$days-1);
    print "$year-$mon-$day\n";
}
drewk
Too much work! :)
brian d foy
@brian d foy: Respectfully, I disagree. My code here is not that beautiful to be sure, but using a well tested CPAN library like Date::Calc (written in C) or Date::Time is worth the overhead. Most of the work here is just generating the strings to test! :-} With the lexicographical solution you are advocating, you still need to check (or hope) that you input has the same delimiters, is a valid date, and does not have leading or trailing whitespace. Once you do that, is it really less work?
drewk
Yes, I've been in situations where it's a lot less work, and the difference between finishing in a reasonable time and never.
brian d foy
+3  A: 
use strict; use warnings;
use DateTime ();
use DateTime::Duration ();
use DateTime::Format::Natural ();

my $parser = DateTime::Format::Natural->new;
my $now    = DateTime->now;
my $delta  = DateTime::Duration->new( days => 30 );
my $cutoff = $now->subtract_duration( $delta );

my @new_dates = map  { $_->[1] }
                grep { -1 == $_->[0] }
                map  { 
                    chomp;
                    [
                        DateTime->compare(
                            $parser->parse_datetime( $_ ),
                            $cutoff
                        ),
                        $_ 
                    ]
                } <DATA>;

print "@new_dates";

__DATA__
2010-07-31
2010-08-31
2010-09-30
2010-10-31
Pedro Silva
Just use `my $now = DateTime->now;` No need to parse `localtime`'s output.
daotoad
Thanks daotoad.
Pedro Silva
OP looking for `YYYY-MM-DD` format, not `YYYY/MM/DD`
Nikhil Jain
The code is the same, either way.
Pedro Silva
This can be a big, unnecessary drain on the performance of your program. The dates are already easy to compare lexigraphically once you construct the (single) cutoff date.
brian d foy
Come on, this is the *correct* solution. You're describing a hack that could, potentially, be used after profiling *and* wouldn't apply to almost any other date formatting,
Pedro Silva
@brian d foy: Respectfully, I disagree with you. I believe it is better to suffer the overhead and deal with dates properly. Most Perl experts (including you) lecture about using CPAN for a well tested solution, rather than roll a quick and dirty solution. The lex comparison of YYYY-MM-DD may be faster but it does not catch bad dates such as 2010-02-29 and it potentially suffers from epoch overflow. Who needs a Y2038 problem??? (2038 is when 32 bit Unix clocks overflow...) File the lex comparison of YYYY-MM-DD dates strings as Cool, Maybe useful, not robust. It is not that much easier anyway.
drewk
Sort lexigraphically isn't a hack. The failure modes exist in the Date modules too. Although your solution will eventually produce the right output, I've been in situations where it fails to produce the output in time for it to be useful.
brian d foy
+1  A: 

You can use Time::ParseDate module,

use strict;
use warning;
use Time::ParseDate;

my @dates = ('2010-10-12', '2010-09-14', '2010-08-12', '2010-09-13');
my @dates = 
  grep {parsedate($_, NO_RELATIVE => 1, UK => 1) > parsedate('-30 days') }@dates;   
 #output: 2010-10-12 2010-09-14
Nikhil Jain
+2  A: 

Guess there's more than one way to do it, but I like Date::Simple for stuff like this ..

An example from the docs:

use Date::Simple ('date', 'today');

# Difference in days between two dates:
$diff = date('2001-08-27') - date('1977-10-05');

# Offset $n days from now:
$date = today() + $n;
print "$date\n";  # uses ISO 8601 format (YYYY-MM-DD)

It's great for doing arithmetic on objects ++.

Only dates however, no hours, minutes or seconds

Øyvind Skaar
+2  A: 

The great thing about YYYY-MM-DD-formatted dates is that you can compare them using simple string comparison. In Perl, that's the lt and gt operators.

In this case, it sounds like you're just looking to check whether the dates in the array are earlier or later than a given target date (which just happens to be "30 days ago"). For that case, the approach I would take would be to first determine what the date was 30 days ago and then compare that as a string against each date in the array. I would not introduce the overhead of converting all the YYYY-MM-DD strings into "proper" date objects, epoch times, etc. and back just for the sake of testing which represents the earlier date.

#!/usr/bin/env perl

use strict;
use warnings;

my $thirty_days = 30 * 24 * 60 * 60;
my ($old_day, $old_month, $old_year) = (localtime(time - $thirty_days))[3..5];
my $cutoff = sprintf('%04d-%02d-%02d', 
                     $old_year + 1900, $old_month + 1, $old_day);

my @dates = ('2010-10-12', '2010-09-12', '2010-08-12', '2010-09-13');
for my $date (@dates) {
  print "$date\n" if $date gt $cutoff;
} 
Dave Sherohman
I'm surprised that so many people jumped to such complicated, knee jerk solutions. We have an Item this just this solution in _Effective Perl Programming_. :)
brian d foy
This assumes no errors in the source dates, such as `2010-02-29` which will compare lexicographically but still be an erroneous date. A date package would point out that source date error, which would seem part of the script's job, no?
drewk
The test `$date gt $cutoff` will fail with: 1) delimiters other than `-` used in the `sprintf`; 2) with leading whitespace on strings that should match; 3) trailing whitespace with strings that should not match. Almost all the CPAN date parsing libraries will handle those 3 situations and many many more. Once you check the delimiters and strip leading and trailing whitespace, are your really saving that much "overhead" or code? Date::Calc is written in C and is really fast. Date::Time is incredibly feature rich. Both check for valid date inputs, where this method does not.
drewk
Well, don't forget all the other sources of errors that can trip up all the date modules. For the most part, the sources of errors that you mention are easy to fix or spot, and when generated programmatically, quite rare. Even with an outlier date like a bad leap year, no Date module is going to magically solve it for you.
brian d foy