ansaurus

Question

Perl script to search pattern and concat lines in a file

Answer 1

+2 A:

If the log file is not too large to keep in memory, you can just keep a hash of date string => text. Something like this:

my %h;
my $cur = "*** No date ***";
while(<>) {
  if (m"^(\d\d/\d\d/\d\d \d\d:\d\d:\d\d:\d{4})") {
    $cur = $1;
  } else {
    $h{$cur} .= $_ unless /^\s*$/;
  }
}

print "$_\n$h{$_}\n" foreach (sort keys %h);

Thy to save this as t.pl and run it as perl t.pl < yourlog.txt. Adjust the regex if needed.

Igor Krivokon 2009-06-12 21:45:59

That's a few too many toothpicks for me: m{^( \d{2} / \d{2} / \d{2} \s \d{2} : \d{2} : \d{2} : \d{4} ) }x

Sinan Ünür 2009-06-12 23:29:42

Sinan, \d\d is shorter than \d{2} and looks better to me. As to the changing to m to avoid another backslash - that makes sense. Thanks, editing my answer.

Igor Krivokon 2009-06-12 23:47:24

\d may be shorter, but it's far less readable.

James Thompson 2009-06-12 23:55:33

Answer 2

+1 A:

It may be a good idea to do this in two stages if the input is huge: Create a SQLite database with a single table with a single table with columns for the timestamp and line (and maybe line number and file name). Then you can output the data any which way you want.

Sinan Ünür 2009-06-12 22:14:14

Answer 3

+3 A:

I've had to do this task before on some very large files and the timestamps did not come in order. I didn't want to store it all in memory. I accomplished the task by using a three-pass solution:

Tag each input line with its timestamp and save in temp file
Sort the temp file with a fast sorter, like sort(1)
Turn the sorted file back into the starting format

This was fast enough for my task where I could let it run while I went for a cup of coffee, but you might have to do something more fancy if you need the results really quickly.

use strict;
use warnings;
use File::Temp qw(tempfile);

my( $temp_fh, $temp_filename )  = tempfile( UNLINK => 1 );

# read each line, tag with timestamp, and write to temp file
# will sort and undo later.
my $current_timestamp = '';
LINE: while( <DATA> )
    {
    chomp;

    if( m|^\d\d/\d\d/\d\d \d\d:\d\d:\d\d:\d\d\d\d$| ) # timestamp line
     {
     $current_timestamp = $_;
     next LINE;
     }
    elsif( m|\S| ) # line with non-whitespace (not a "blank line")
     {
     print $temp_fh "[$current_timestamp] $_\n";
     }
    else # blank lines
     {
     next LINE;
     }
    }

close $temp_fh;

# sort the file by lines using some very fast sorter
system( "sort", qw(-o sorted.txt), $temp_filename );

# read the sorted file and turn back into starting format
open my($in), "<", 'sorted.txt' or die "Could not read sorted.txt: $!";

$current_timestamp = '';
while( <$in> )
    {
    my( $timestamp, $line ) = m/\[(.*?)] (.*)/;
    if( $timestamp ne $current_timestamp )
     {
     $current_timestamp = $timestamp;
     print $/, $timestamp, $/;
     }

    print $line, $/;
    }

unlink $temp_file, 'sorted.txt';

__END__
01/01/70 12:00:00:0004
This is line 3
This is line 4
This is line 5

01/01/70 12:00:00:0001
This is line 1
This is line 2


01/01/70 12:00:00:0004
This is line 6
This is line 7

brian d foy 2009-06-12 22:27:03

You don't need save 'sorted.txt' file, you can introduce security issue. You can use open '-|' form to redirect sort output.

Hynek -Pichi- Vychodil 2009-06-13 07:48:46

It's sometimes nice to have the intermediate result (especially if it takes a long time to make it), but it's not a big deal. I'm not sure what security issue you think there is, but there would be other things to worry about.

brian d foy 2009-06-13 20:55:57

Awesome. This script worked. I had to tweak the regex for timestamp a bit to match mine, but got it to work. Thanks a bunch!

Ranjith 2009-06-15 19:14:47

Answer 4

A:

Consider this solution...

    #!/usr/bin/perl

    use strict;

    my (%time, $id);
    while (<DATA>) {
        if ( /^mm/ ... /\n\n/ ) {
            chomp;
            s/^mm\/dd\/yy\s(.*)// and $id = $1;
            next if ( /^mm/ || /^$/ );
            push (@{$time{$id}}, $_);
       }

}

for my $i ( keys %time ) {
    print "mm/dd/yy $i\n";
    for my $j ( @{$time{$i}} ) {
        print "$j\n";
    }
    print "\n";
}

__DATA__
mm/dd/yy 12:00:00:0001
This is line 1
This is line 2

mm/dd/yy 12:00:00:0004
This is line 3
This is line 4
This is line 5


mm/dd/yy 12:00:00:0004
This is line 6
This is line 7

bichonfrise74 2009-06-13 00:25:32

Thanks...But this doesn't seem to combine lines 3-5 with 6 and 7.

Ranjith 2009-06-15 18:13:55

ansaurus

tags:

views:

answers:

Perl script to search pattern and concat lines in a file

related questions