views:

135

answers:

3

Hi,

Advance New Year Wishes to All.

I have an error log file with the contents in a pattern parameter, result and stderr (stderr can be in multiple lines).

$cat error_log

<parameter>:test_tot_count
<result>:1
<stderr>:Expected "test_tot_count=2" and the actual value is 3
test_tot_count = 3
<parameter>:test_one_count
<result>:0
<stderr>:Expected "test_one_count=2" and the actual value is 0
test_one_count = 0
<parameter>:test_two_count
<result>:4
<stderr>:Expected "test_two_count=2" and the actual value is 4
test_two_count = 4
...

I need to write a function in Perl to store each parameters, result and stderr in an array or hash table.

This is our own internally defined structure. I wrote the Perl function like this. Is there a better way of doing this using regular expression itself?

my $err_msg = "";
while (<ERR_LOG>)
{
    if (/<parameter>:/)
    {
        s/<parameter>://;
        push @parameter, $_;
    }
    elsif (/<result>:/)
    {
        s/<result>://;
        push @result, $_;
    }
    elsif (/<stderr>:/)
    {
        if (length($err_msg) > 0)
        {
            push @stderr, $err_msg;
        }
        s/<stderr>://;
        $err_msg = $_;
    }
    else
    {
        $err_msg .= $_;
    }
 } 
 if (length($err_msg) > 0)
 {
    push @stderr, $err_msg;
 }
+1  A: 

Looks nice. =) An improvement is probably to anchor those tags at the beginning of the line:

if (/^<parameter>:/)

It'll make the script a bit more robust.

You can also avoid the stripping of the tag if you catch what's after it and use only that part:

if (/^<parameter>:(.*)/s)
{
  push @parameter, $1;
}
PEZ
Thanks, I missed that validation :-)
Liju Mathew
+3  A: 

The main thing that jumps out for refactoring is the repetition in the matching, stripping, and storing. Something like this (untested) code is more concise:

my( $err_msg , %data );

while (<ERR_LOG>) {
  if(( my $key ) = $_ =~ s/^<(parameter|result|stderr)>:// ) {
    if( $key eq 'stderr' ) {
      push @{ $data{$key} } , $err_msg if $err_msg;
      $err_msg = $_;
    }
    else { push @{ $data{$key} } , $_ }
  }
  else { $err_msg .= $_ }
}

# grab the last err_msg out of the hopper
push @{ $data{stderr} } , $err_msg;

... but it may be harder to understand six months from now... 8^)

genehack
+4  A: 

If you're using Perl 5.10, you can do something very similar to what you have now but with a much nicer layout by using the given/when structure:

use 5.010;

while (<ERR_LOG>) {
    chomp;
    given ($_) {
        when ( m{^<parameter>: (.*)}x )  { push @parameter, $1     }
        when ( m{^<result>:    (.*)}x )  { push @result,    $1     }
        when ( m{^<stderr>:    (.*)}x )  { push @stderr,    $1     }
        default                          { $stderr[-1]   .= "\n$_" }
    }
}

It's worth noting that for the default case here, rather than keeping a separate $err_msg variable, I'm simply pushing onto @stderr when I see a stderr tag, and appending to the last item of the @stderr array if I see a continuation line. I'm adding a newline when I see continuation lines, since I assume you want them preserved.

Despite the above code looking quite elegant, I'm not really all that fond of keeping three separate arrays, since it will presumably cause you headaches if things get out of sync, and because if you want to add more fields in the future you'll end up with lots and lots of variables floating around that you'll need to keep track of. I'd suggest storing each record inside a hash, and then keeping an array of records:

use 5.010;

my @records;

my $prev_key;

while (<ERR_LOG>) {
    chomp;
    given ($_) {
        when ( m{^<parameter>  }x ) { push(@records, {}); continue;          }
        when ( m{^<(\w+)>: (.*)}x ) { $records[-1]{$1} = $2; $prev_key = $1; }
        default                     { $records[-1]{$prev_key} .= "\n$_";     }
    }
}

Here we're pushing a new record onto the array when we see a field, adding an entry to our hash whenever we see a key/value pair, and appending to the last field we added to if we see a continuation line. The end result of @records looks like this:

(
    {
        parameter => 'test_one_count',
        result    => 0,
        stderr    => qq{Expected "test_one_count=2" and the actual value is 0\ntest_one_count=0},
    },
    {
        parameter => 'test_two_count',
        result    => 4,
        stderr    => qq{Expected "test_two_count=2" and the actual value is 4\ntest_two_count=4},
    }
)

Now you can pass just a single data structure around which contains all of your records, and you can add more fields in the future (even multi-line ones) and they'll be correctly handled.

If you're not using Perl 5.10, then this may be a good excuse to upgrade. If not, you can translate the given/when structures into more traditional if/elsif/else structures, but they lose much of their beauty in the conversion.

Paul

pjf
I like the first one. It's a nice and tight refactor. +1
Axeman
`use Switch 'Perl6'`, works on older Perls.
Brad Gilbert
The second one is **much** better.
Brad Gilbert
If you're using perl 5.10, why now use named captures, e.g. when ( m{^<(/<key>\w+)>: (?<value>.*)}x ) { $records[-1]{$+{key}} = $+{value}; $prev_key = $+{key}; }It makes the code slightly longer, but avoids the uncomfortable hard-coded $1 and $2 references.
Sam Kington
Thanks for the solution. We are using Perl v5.8.5. I implemented the code with array containing hash table concept using if else. It helped me to understand more on array concepts in Perl and moreover this has reduced the number of lines in code also.
Liju Mathew