views:

143

answers:

4

In Perl I am trying to read a log file and will print only the lines that have a timestamp between two specific times. The time format is hh:mm:ss and this is always the third value on each log. For example, I would be searching for lines that would fall between 12:52:33 to 12:59:33

I am new to Perl and have no idea which route to take to even begin to program this. I am pretty sure this would use some type of regex, but for the life of me I cannot even begin to fathom what that would be. Could someone please assist me with this.

Also, to make this more difficult I have to do this with the core Perl modules because my company will not allow me to use any other modules until they have been tested and verified there will be no ill effects on any of the systems the script may interact with.

+2  A: 

In pseudocode, you'd do something like this:

  • read in the file line by line:
    • parse the timestamp for this line.
    • if it's less than the start time, skip to the next line.
    • if it's greater than the end time, skip to the next line!
    • else: this is a line you want: print it out.

This may be too advanced for your needs, but the flip-flop operator .. immediately comes to mind as something that would be useful here.

For reading in a file from stdin, this is the conventional pattern:

while (my $line = <>)
{
     # do stuff...
}

Parsing a line into fields can be done easily with split (see perldoc -f split). You will probably need to split the line by tabs or spaces, depending on the format.

Once you've got the particular field (containing the timestamp), you can examine it using a customized regexp. Read about those at perldoc perlre.

Here's something which might get you closer:

use strict;
use warnings;

use POSIX 'mktime';
my $starttime = mktime(33, 52, 12);
my $endtime = mktime(33, 59, 12);

while (my $line = <>)
{
    # split into fields using whitespace as the delimiter
    my @fields = split(/\s+/, $line);

    # the timestamp is the 3rd field
    my $timestamp = $fields[2];

    my ($hour, $min, $sec) = split(':', $timestamp);
    my $time = mktime($sec, $min, $hour);

    next unless ($time < $starttime) .. ($time > $endtime);
    print $line;
}
Ether
And if you want O(logN) instead of O(N) you can use binary search instead of reading every line (assuming log files are sorted by timestamp).
serg
Such a task is well-suited to the flip-flop operator.
Zaid
+1  A: 

If each line in the file has the time stamp, then in 'sed' you could write:

sed -n '/12:52:33/,/12:59:33/p' logfile

This will echo the relevant lines.

There is a Perl program, s2p, that will convert 'sed' scripts to Perl.

The basic Perl structure is along the lines of:

my $atfirst = 0;
my $atend = 0;
while (<>)
{
    last if $atend;
    $atfirst = 1 if m/12:52:33/;
    $atend = 1 if m/12:59:33/;
    if ($atfirst)
    {
        process line as required
    }
}

Note that as written, the code will process the first line that matches the end marker. If you don't want that, move the 'last' after the test.

Jonathan Leffler
A: 

If your log files are segregated by day, you could convert the timestamps to seconds and compare those. (If not, use the technique from my answer to a question you asked earlier.)

Say your log is

12:52:32 outside
12:52:43 strictly inside
12:59:33 end
12:59:34 outside

Then with

#! /usr/bin/perl

use warnings;
use strict;

my $LOGPATH = "/tmp/foo.log";

sub usage { "Usage: $0 start-time end-time\n" }

sub to_seconds {
  my($h,$m,$s) = split /:/, $_[0];
  $h * 60 * 60 +
       $m * 60 +
            $s;
}

die usage unless @ARGV == 2;
my($start,$end) = map to_seconds($_), @ARGV;

open my $log, "<", $LOGPATH or die "$0: open $LOGPATH: $!";
while (<$log>) {
  if (/^(\d+:\d+:\d+)\s+/) {
    my $time = to_seconds $1;
    print if $time >= $start && $time <= $end;
  }
  else {
    warn "$0: $LOGPATH:$.: no timestamp!\n";
  }
}

you'd get the following output:

$ ./between 12:52:33 12:59:33
12:52:43 strictly inside
12:59:33 end
Greg Bacon
A: 

If the start and end times are known, a Perl one-liner with a flip-flop operator is what you need:

perl -ne 'print if /12:52:33/../12:59:33/' logFile

If there is some underlying logic needed in order for you to determine the start and end times, then 'unroll' the one-liner to a formal script:

use strict;
use warnings;

open my $log, '<', 'logFile';

my $startTime = get_start_time();  # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time();      # Sets $endTime in hh:mm:ss format

while ( <$log> ) {

    print if /$startTime/../$endTime/;
}

As noted by Ether's comment, this will fail if the exact time is not present. If this is a possibility, one might implement the following logic instead:

use strict;
use warnings;
use autosplit;

open my $log, '<', 'logFile';

my $startTime = get_start_time();  # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time();      # Sets $endTime in hh:mm:ss format

while ( <$log> ) {

    my $time = (split /,/, $_)[2];      # Assuming fields are comma-separated
                                        # and timelog is 3rd field

    last  if $time gt $endTime;         # Stop when stop time reached
    print if $time ge $startTime;
}
Zaid
That conditional will fail if there is not a line with a timestamp that exactly matches the start or end time.
Ether
@Ether : Agreed. This is what happens when the OP doesn't specify sufficient information about the problem.
Zaid