views:

109

answers:

3

Hey, I am working on a very basic parser. I am almost certain that my regex is correct, but values do not seem to be being stored in my $1 and $2. Am I doing something wrong? I am just looking for tips to alter my code. Thanks for any advice! Also, I am new to Perl, so if I did something wrong, I am looking to get off on the right foot and develop solid habits.

Sample line from file:

Sat 02-August-2008 20:47 - 123.112.3.209 - "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;

I am just getting the hours from the times.

foreach my $line (@lines)
{   
my $match =~ /\d\d-\w+-\d{4} (\d)(\d):\d\d/;

if( $1 == 0)
{
    $times[$2] = $times[$2] + 1;
}
else
{   
    my $time = $1.$2;
    $times[$time] = $times[$time]+ 1;
}
 }


print "\n";
for(my $i=0;$i<24;$i++)
{
print "$i: $times[$i]\n";
}
+7  A: 

If you want to match on $line shouldn't the code read

$line =~ /\d\d-\w+-\d{4} (\d)(\d):\d\d/;

See here.

Brian Rasmussen
Hm, I made that change and is still not working. My $1 and $2 seem to be empty. I appreciate the tip and I think you might be right about it having to be $line, but since I am a beginner at best to Perl, I am not certain
PFranchise
I'm getting 2 and 0, from the sample line just as expected. You did get rid of `my` in front of `$line` right?
Brian Rasmussen
haha, thanks for seeing through my newness. I appreciate you taking the time to help. Have a great night.
PFranchise
Glad I could help.
Brian Rasmussen
+3  A: 

Can you give some example of what kind of pattern you are try to match? Otherwise I won't be able to tell if your regex matches your pattern or not. However there are some improvements you can make about your code:

First off, always test if a match is successful if you want to use $1, $2 etc

if($match =~ /\d\d-\w+-\d{4} (\d)(\d):\d\d/) {

    if( $1 == 0)
    {
        $times[$2] = $times[$2] + 1;
    }
    else
    {   
        my $time = $1.$2;
        $times[$time] = $times[$time]+ 1;
    }
} else {
    warn "no match!\n";
}

Second, always use the '-w' switch. In this case, you will probably get the warning message about $1 and $2 are not initialized due to failed match:

#!/usr/bin/perl -w
Dasvid
Thanks for the tips. I really appreciate it. The other fellow's answer solved my issue, so there is no need to look into my issue in more detail. But, I will incorporate your tips into my code. Have a great evening
PFranchise
Glad I helped. You, too!
Dasvid
+1  A: 

First, if you are new to Perl, one of the strengths is CPAN and the many solutions there. Don't reinvent the wheel!

There is a great module called Date::Parse that will parse the time part for you. Then the only regex problem that you have is separating out the time part of your line.

Based on your one line sample, this code will do that:

use strict;
use warnings;

use Date::Parse;

my $line="Sat 02-August-2008 20:47 - 123.112.3.209 - \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;";
my $tmpart;

if ($line=~ /^(.*\d+:\d+) -/) {
    $tmpart=$1;

    print "Time part = $tmpart\n";

    my $time=str2time($tmpart);
    my ($ss,$mm,$hh,$day,$month,$year,$zone) = strptime($tmpart);

    $year+=1900;
    $month+=1;

    print "Unix time: $time\n";
    print "Parsed time: $month/$day/$year $hh:$mm:$ss  \n\n";
} 
else {
   warn "no match!\n";
}   

This will return a Unix time number that is then easy to work with. Or (as shown) you can parse the individual components of the time.

drewk
This line creates some problems: `my $tmpart = $1 if $line=~ /^(.*\d+:\d+) -/;` If your regex fails to match, `$tmpart` is undefined, and you will get a bunch of warnings and bogus results.
daotoad
Easy fix with some error logic code. Done! Thanks for pointing it out.
drewk