Though I almost always tell people to go use one of the many excellent modules from the CPAN for this, most of them do have one major drawback - speed. If you're parsing a large number of log files in real-time, that can sometimes be an issue. In those cases, rolling your own can often be a more suitable solution, but there are many pitfalls and nuances that must be considered and handled properly. Hence the preference for using a known-correct, proven, reliable module written by somebody else. :)
However, before I even considered my advice above, I looked at your code and had converted it to perl in my head... therefore, here is a more-or-less direct conversion of your gawk code into perl. I've tried to write it as simply as possible, so as to highlight some of the more delicate parts of dealing with dates and times in perl by hand.
# import the mktime function from the (standard) POSIX module
use POSIX qw( mktime );
sub log4jTimeStampToMillis {
my ($log4jts, $dst) = @_;
# extract the millisecond field
my ($tsstr, $millis) = split( ',', $log4jts );
# extract values to pass to mktime()
my @mktime_args = reverse split( '[-: ]', $tsstr );
# munge values for posix compatibility (ugh)
$mktime_args[3] -= 1;
$mktime_args[4] -= 1;
$mktime_args[5] -= 1900;
# print Dumper \@mktime_args; ## DEBUG
# convert, make sure to account for daylight savings
my $seconds = mktime( @mktime_args, 0, 0, $dst );
# return that time as milliseconds since the epoch
return $seconds * 1000 + $millis;
}
One important difference between my code and yours - my log4jTimeStampToMillis subroutine takes two parameters:
- the log timestamp string
- whether or not that timestamp is using daylight savings time ( 1 for true, 0 for false )
Of course, you could just add code to detect if that time falls in DST or not and adjust automatically, but I was trying to keep it simple. :)
NOTE: If you uncomment the line marked DEBUG, make sure to add "use Data::Dumper;" before that line in your program so it will work.
Here's an example of how you could test that subroutine:
my $milliseconds = log4jTimeStampToMillis( "2009-05-10 00:48:41,905", 1 );
my $seconds = int( $milliseconds / 1000 );
my $local = scalar localtime( $seconds );
print "ms: $milliseconds\n"; # ms: 1241844521905
print "sec: $seconds\n"; # sec: 1241844521
print "local: $local\n"; # local: Sat May 9 00:48:41 2009