views:

100

answers:

4

I am working on a small perl program that will open a site and search for the words Hail Reports and give me back the information. I am very new to perl so some of this might be simple to fix. First my code says I am using an unitialized value. Here is what I have

#!/usr/bin/perl -w
use LWP::Simple;

my $html = get("http://www.spc.noaa.gov/climo/reports/last3hours.html")
    or die "Could not fetch NWS page.";
$html =~ m{Hail Reports} || die;
my $hail = $1;
print "$hail\n";

Secondly, I thought regular expressions would be the easiest way to do what I want, but I am not sure if I can do it with them. I want my program to search for the words Hail Reports and send me back the information between Hails Reports and the words Wind Reports. Is this possible with regular Expressions or should I be using a different method? Here is a snippet of the webpages source code that I want it to send back

     <tr><th colspan="8">Hail Reports (<a href="last3hours_hail.csv">CSV</a>)&nbsp;(<a href="last3hours_raw_hail.csv">Raw Hail CSV</a>)(<a href="/faq/#6.10">?</a>)</th></tr> 

#The Data here will change throughout the day so normally there will be more info.
      <tr><td colspan="8" class="highlight" align="center">No reports received</td></tr> 
      <tr><th colspan="8">Wind Reports (<a href="last3hours_wind.csv">CSV</a>)&nbsp;(<a href="last3hours_raw_wind.csv">Raw Wind CSV</a>)(<a href="/faq/#6.10">?</a>)</th></tr> 
+2  A: 

You were capturing nothing in $1 because none of your regex was enclosed in parentheses. The following works for me.

#!/usr/bin/perl
use strict;
use warnings;

use LWP::Simple;

my $html = get("http://www.spc.noaa.gov/climo/reports/last3hours.html")
    or die "Could not fetch NWS page.";

$html =~ m{Hail Reports(.*)Wind Reports}s || die; #Parentheses indicate capture group
my $hail = $1; # $1 contains whatever matched in the (.*) part of above regex
print "$hail\n";
d5e5
Thanks that covers both problems nicely.
shinjuo
+1  A: 

Parenthesis capture strings in regular expressions. You have no parenthesis in your regex, so $1 is not set to anything. If you had:

$html =~ m{(Hail Reports)} || die;

Then $1 would be set to "Hail Reports" if it exists in the $html variable. Since you only want to know if it matched, then you really don't need to capture anything at this point and you could write something like:

unless ( $html =~ /Hail Reports/ ) {
  die "No Hail Reports in HTML";
}

To capture something between the strings you can do something like:

if ( $html =~ /(?<=Hail Reports)(.*?)(?=Wind Reports)/s ) {
  print "Got $1\n";
}
runrig
you need the 's' modifier on the regex to match across newlines, ie =~ /.../s
c-urchin
Thanks. updated.
runrig
+3  A: 

The uninitialized-value warning is coming from $1 -- it's not defined or set anywhere.

For a line-level instead of byte-level "between" you could use:

for (split(/\n/, $html)) {
    print if (/Hail Reports/ .. /Wind Reports/ and !/(?:Hail|Wind) Reports/);
}
Jim Davis
+3  A: 

Makes use of single and multi-line matches. Plus it only picks up the first match for the between text, which will be a little faster than being greedy.

#!/usr/bin/perl -w

use strict;
use LWP::Simple;

   sub main{
      my $html = get("http://www.spc.noaa.gov/climo/reports/last3hours.html")
                 or die "Could not fetch NWS page.";

      # match single and multiple lines + not greedy
      my ($hail, $between, $wind) = $html =~ m/(Hail Reports)(.*?)(Wind Reports)/sm
                 or die "No Hail/Wind Reports";

      print qq{
               Hail:         $hail
               Wind:         $wind
               Between Text: $between
            };
   }

   main();
Armando