ansaurus

Question

How to search for lines in a file between two timestamps using Perl?

Answer 1

+2 A:

In pseudocode, you'd do something like this:

read in the file line by line:
- parse the timestamp for this line.
- if it's less than the start time, skip to the next line.
- if it's greater than the end time, skip to the next line!
- else: this is a line you want: print it out.

This may be too advanced for your needs, but the flip-flop operator .. immediately comes to mind as something that would be useful here.

For reading in a file from stdin, this is the conventional pattern:

while (my $line = <>)
{
     # do stuff...
}

Parsing a line into fields can be done easily with split (see perldoc -f split). You will probably need to split the line by tabs or spaces, depending on the format.

Once you've got the particular field (containing the timestamp), you can examine it using a customized regexp. Read about those at perldoc perlre.

Here's something which might get you closer:

use strict;
use warnings;

use POSIX 'mktime';
my $starttime = mktime(33, 52, 12);
my $endtime = mktime(33, 59, 12);

while (my $line = <>)
{
    # split into fields using whitespace as the delimiter
    my @fields = split(/\s+/, $line);

    # the timestamp is the 3rd field
    my $timestamp = $fields[2];

    my ($hour, $min, $sec) = split(':', $timestamp);
    my $time = mktime($sec, $min, $hour);

    next unless ($time < $starttime) .. ($time > $endtime);
    print $line;
}

Ether 2010-06-28 18:16:12

And if you want O(logN) instead of O(N) you can use binary search instead of reading every line (assuming log files are sorted by timestamp).

serg 2010-06-28 18:55:45

Such a task is well-suited to the flip-flop operator.

Zaid 2010-06-28 20:44:17

Answer 2

+1 A:

If each line in the file has the time stamp, then in 'sed' you could write:

sed -n '/12:52:33/,/12:59:33/p' logfile

This will echo the relevant lines.

There is a Perl program, s2p, that will convert 'sed' scripts to Perl.

The basic Perl structure is along the lines of:

my $atfirst = 0;
my $atend = 0;
while (<>)
{
    last if $atend;
    $atfirst = 1 if m/12:52:33/;
    $atend = 1 if m/12:59:33/;
    if ($atfirst)
    {
        process line as required
    }
}

Note that as written, the code will process the first line that matches the end marker. If you don't want that, move the 'last' after the test.

Jonathan Leffler 2010-06-28 18:26:06

Answer 3

A:

If your log files are segregated by day, you could convert the timestamps to seconds and compare those. (If not, use the technique from my answer to a question you asked earlier.)

Say your log is

12:52:32 outside
12:52:43 strictly inside
12:59:33 end
12:59:34 outside

Then with

#! /usr/bin/perl

use warnings;
use strict;

my $LOGPATH = "/tmp/foo.log";

sub usage { "Usage: $0 start-time end-time\n" }

sub to_seconds {
  my($h,$m,$s) = split /:/, $_[0];
  $h * 60 * 60 +
       $m * 60 +
            $s;
}

die usage unless @ARGV == 2;
my($start,$end) = map to_seconds($_), @ARGV;

open my $log, "<", $LOGPATH or die "$0: open $LOGPATH: $!";
while (<$log>) {
  if (/^(\d+:\d+:\d+)\s+/) {
    my $time = to_seconds $1;
    print if $time >= $start && $time <= $end;
  }
  else {
    warn "$0: $LOGPATH:$.: no timestamp!\n";
  }
}

you'd get the following output:

$ ./between 12:52:33 12:59:33
12:52:43 strictly inside
12:59:33 end

Greg Bacon 2010-06-28 19:22:07

Answer 4

A:

If the start and end times are known, a Perl one-liner with a flip-flop operator is what you need:

perl -ne 'print if /12:52:33/../12:59:33/' logFile

If there is some underlying logic needed in order for you to determine the start and end times, then 'unroll' the one-liner to a formal script:

use strict;
use warnings;

open my $log, '<', 'logFile';

my $startTime = get_start_time();  # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time();      # Sets $endTime in hh:mm:ss format

while ( <$log> ) {

    print if /$startTime/../$endTime/;
}

As noted by Ether's comment, this will fail if the exact time is not present. If this is a possibility, one might implement the following logic instead:

use strict;
use warnings;
use autosplit;

open my $log, '<', 'logFile';

my $startTime = get_start_time();  # Sets $startTime in hh:mm:ss format
my $endTime = get_end_time();      # Sets $endTime in hh:mm:ss format

while ( <$log> ) {

    my $time = (split /,/, $_)[2];      # Assuming fields are comma-separated
                                        # and timelog is 3rd field

    last  if $time gt $endTime;         # Stop when stop time reached
    print if $time ge $startTime;
}

Zaid 2010-06-28 20:43:14

That conditional will fail if there is not a line with a timestamp that exactly matches the start or end time.

Ether 2010-06-28 21:56:00

@Ether : Agreed. This is what happens when the OP doesn't specify sufficient information about the problem.

Zaid 2010-06-29 06:06:01

ansaurus

tags:

views:

answers:

How to search for lines in a file between two timestamps using Perl?

related questions