views:

139

answers:

4

In Perl, how would one efficiently parse the output of unix's date command, taking into account time zone, and also convert to UTC?

I've read many similar questions on stackoverflow, but few seem to take into account parsing multiple time zones. Instead they seem to set the timezone manually and assume it to stay fixed.

# Example Input Strings:
my @inputs = (
              'Tue Oct 12 06:31:48 EDT 2010',
              'Tue Oct 12 07:49:54 BST 2010',
             );

I tried the following to no avail:

foreach my $input ( @inputs ) {
  my $t = Time::Piece->strptime( $input,
                                 '%a %b %d %T %Z %Y' );
  print $t->cdate, "\n";
}

It seems the problem is the time zone (%Z). Additionally, a time zone field does not seem to exist in Time::Piece, which would require me to write custom code to convert to UTC, which just seems... wrong.

Context: I'm attempting to parse legacy logs from a variety of sources that use the unix date command for timestamps. Ideally, I'd like to convert all timestamps to UTC.

Any help would be greatly appreciated.

+4  A: 

The Perl DateTime FAQ on timezones has a good background on why EDT and EST cannot be used in most conversions. The issue is that other countries also have an Eastern time zone with the same 3 letter abbreviation. EST EDT is ambiguous without other clues.

You might look at other modules, or just assume that "EDT" is the same as "EST5EDT" if that is true.

drewk
I added "$input =~ s/ EDT / EST5EDT /;" before calling strptime, but strptime still fails to parse the string. Additionally, I still believe Time::Piece is insufficient since it does not store the timezone, it only "allows" it to be passed through the FORMAT string :(
vlee
Thank you very much for pointing out the short timezone name ambiguity though!
vlee
@vlee: You may need to use another module. There are many CPAN modules in the `DateTime::Format::*` group.
drewk
DateTime::Format::Strptime looks especially promising. I'll try that soon and really hope it captures %Z unlike Time::Piece.
vlee
+3  A: 

If you know how to disambiguate the TZs, just pop them into a dispatch table:

use strict; use warnings;
use DateTime::Format::Strptime ();

my @inputs = (
    'Tue Oct 12 06:31:48 EDT 2010',
    'Tue Oct 12 07:49:54 BST 2010',
);

my %tz_dispatch = (
    EDT => build_parser( 'EST5EDT' ),
    BST => build_parser( '+0100' ),
    # ... etc
    default => build_parser( ),
);

for my $input (@inputs) {
    my ($parser, $date) = parse_tz( $input, %tz_dispatch );
    print $parser->parse_datetime( $date ), "\n";
}

sub build_parser {
    my ($tz) = @_;

    my %conf = (
        pattern   => '%a %b %d %T %Z %Y',
        on_error  => 'croak',
    );
    @conf{qw/time_zone pattern/} = ($tz, '%a %b %d %T %Y')
    if $tz;

    return DateTime::Format::Strptime->new( %conf );
}

sub parse_tz {
    my ($date, %tz_dispatch) = @_;
    my (@date) = split /\s/, $date;

    my $parser = $tz_dispatch{splice @date, 4, 1};

    return $parser
    ? ($parser, join ' ', @date)
    : ($tz_dispatch{default}, $date);
}
Pedro Silva
Thanks, your code definitely works. However, now I'm more confused about the %Z identifier. In your code, a new DateTime::Format::Strptime is created for EDT(EST5EDT) and BST(+0100) time zones, instead of using the same object and parsing the entire string with parse_datetime. I tried "Tue Oct 12 08:00:00 GMT 2010" which worked with the default object. However, when I try "UTC" or "EST5EDT" the default object croaks with "I don't recognise the timezone <foo>". I'm guessing this is expected behavior, but I'm not sure why. I wonder what are recognizable/acceptable timezone strings for %Z.
vlee
The Strptime parser takes a string which, if it includes the timezone, the parser attempts to pass it onto to DateTime::TimeZone. If the string does not include the timezone, then the parser constructor needs the `time_zone` parameter. I also had a hard time figuring out the appropriate, non-ambiguous, timezone names. Basically, anything of the form '[-+]\d{4}' works. Hope this helps.
Pedro Silva
+1: That is a good answer...
drewk
A: 

I've always found Date::Manip::ParseDate to be good for these sorts of situations.

use strict;
use warnings qw<FATAL all>;
use Date::Manip qw<ParseDate UnixDate>;

my @inputs = (
    q<Tue Oct 12 06:31:48 EDT 2010>,
    q<Tue Oct 12 07:49:54 BST 2010>,
);

sub date2epoch($) {
    my $user_string = shift();
    my $timestamp   = ParseDate($user_string);
    my $seconds     = UnixDate($timestamp, "%s");
    return $seconds;
}

sub epoch2utc($) {
    my $seconds = shift();
    return gmtime($seconds) . q< UTC>;
}

for my $random_date (@inputs) {
    my $epoch_seconds = date2epoch($random_date);
    my $normal_date   = epoch2utc($epoch_seconds);
    print "$random_date == $normal_date\n";
}

When run, that produces this:

Tue Oct 12 06:31:48 EDT 2010 == Tue Oct 12 10:31:48 2010 UTC
Tue Oct 12 07:49:54 BST 2010 == Tue Oct 12 06:49:54 2010 UTC

which seem to be what you're looking for.

tchrist
A: 

I'm a little late on this, but GNU date itself is good at parsing dates:

$ date -u -d 'Thu Oct 14 01:17:00 EDT 2010'
Thu Oct 14 05:17:00 UTC 2010

I don't know how it resolves the EDT ambiguity though.

Jander