views:

1417

answers:

4

I have a text file (basically an error log with date, timestamp and some data) in the following pattern:

mm/dd/yy 12:00:00:0001  
This is line 1
This is line 2

mm/dd/yy 12:00:00:0004  
This is line 3
This is line 4
This is line 5


mm/dd/yy 12:00:00:0004
This is line 6
This is line 7

I'm new at Perl and need to write a script that searches the file for timestamps and merges the data that have the same timestamp in it.

I'm expecting the following output for the above sample.

mm/dd/yy 12:00:00:0001  
This is line 1
This is line 2

mm/dd/yy 12:00:00:0004  
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7

What's the best way to get this done?

+2  A: 

If the log file is not too large to keep in memory, you can just keep a hash of date string => text. Something like this:

my %h;
my $cur = "*** No date ***";
while(<>) {
  if (m"^(\d\d/\d\d/\d\d \d\d:\d\d:\d\d:\d{4})") {
    $cur = $1;
  } else {
    $h{$cur} .= $_ unless /^\s*$/;
  }
}

print "$_\n$h{$_}\n" foreach (sort keys %h);

Thy to save this as t.pl and run it as perl t.pl < yourlog.txt. Adjust the regex if needed.

Igor Krivokon
That's a few too many toothpicks for me: m{^( \d{2} / \d{2} / \d{2} \s \d{2} : \d{2} : \d{2} : \d{4} ) }x
Sinan Ünür
Sinan, \d\d is shorter than \d{2} and looks better to me. As to the changing to m to avoid another backslash - that makes sense. Thanks, editing my answer.
Igor Krivokon
\d may be shorter, but it's far less readable.
James Thompson
+1  A: 

It may be a good idea to do this in two stages if the input is huge: Create a SQLite database with a single table with a single table with columns for the timestamp and line (and maybe line number and file name). Then you can output the data any which way you want.

Sinan Ünür
+3  A: 

I've had to do this task before on some very large files and the timestamps did not come in order. I didn't want to store it all in memory. I accomplished the task by using a three-pass solution:

  • Tag each input line with its timestamp and save in temp file
  • Sort the temp file with a fast sorter, like sort(1)
  • Turn the sorted file back into the starting format

This was fast enough for my task where I could let it run while I went for a cup of coffee, but you might have to do something more fancy if you need the results really quickly.

use strict;
use warnings;
use File::Temp qw(tempfile);

my( $temp_fh, $temp_filename )  = tempfile( UNLINK => 1 );

# read each line, tag with timestamp, and write to temp file
# will sort and undo later.
my $current_timestamp = '';
LINE: while( <DATA> )
    {
    chomp;

    if( m|^\d\d/\d\d/\d\d \d\d:\d\d:\d\d:\d\d\d\d$| ) # timestamp line
     {
     $current_timestamp = $_;
     next LINE;
     }
    elsif( m|\S| ) # line with non-whitespace (not a "blank line")
     {
     print $temp_fh "[$current_timestamp] $_\n";
     }
    else # blank lines
     {
     next LINE;
     }
    }

close $temp_fh;

# sort the file by lines using some very fast sorter
system( "sort", qw(-o sorted.txt), $temp_filename );

# read the sorted file and turn back into starting format
open my($in), "<", 'sorted.txt' or die "Could not read sorted.txt: $!";

$current_timestamp = '';
while( <$in> )
    {
    my( $timestamp, $line ) = m/\[(.*?)] (.*)/;
    if( $timestamp ne $current_timestamp )
     {
     $current_timestamp = $timestamp;
     print $/, $timestamp, $/;
     }

    print $line, $/;
    }

unlink $temp_file, 'sorted.txt';

__END__
01/01/70 12:00:00:0004
This is line 3
This is line 4
This is line 5

01/01/70 12:00:00:0001
This is line 1
This is line 2


01/01/70 12:00:00:0004
This is line 6
This is line 7
brian d foy
You don't need save 'sorted.txt' file, you can introduce security issue. You can use open '-|' form to redirect sort output.
Hynek -Pichi- Vychodil
It's sometimes nice to have the intermediate result (especially if it takes a long time to make it), but it's not a big deal. I'm not sure what security issue you think there is, but there would be other things to worry about.
brian d foy
Awesome. This script worked. I had to tweak the regex for timestamp a bit to match mine, but got it to work. Thanks a bunch!
Ranjith
A: 

Consider this solution...

    #!/usr/bin/perl

    use strict;

    my (%time, $id);
    while (<DATA>) {
        if ( /^mm/ ... /\n\n/ ) {
            chomp;
            s/^mm\/dd\/yy\s(.*)// and $id = $1;
            next if ( /^mm/ || /^$/ );
            push (@{$time{$id}}, $_);
       }

}

for my $i ( keys %time ) {
    print "mm/dd/yy $i\n";
    for my $j ( @{$time{$i}} ) {
        print "$j\n";
    }
    print "\n";
}

__DATA__
mm/dd/yy 12:00:00:0001
This is line 1
This is line 2

mm/dd/yy 12:00:00:0004
This is line 3
This is line 4
This is line 5


mm/dd/yy 12:00:00:0004
This is line 6
This is line 7
bichonfrise74
Thanks...But this doesn't seem to combine lines 3-5 with 6 and 7.
Ranjith