ansaurus

Question

Answer 1

+2 A:

You were capturing nothing in $1 because none of your regex was enclosed in parentheses. The following works for me.

#!/usr/bin/perl
use strict;
use warnings;

use LWP::Simple;

my $html = get("http://www.spc.noaa.gov/climo/reports/last3hours.html")
    or die "Could not fetch NWS page.";

$html =~ m{Hail Reports(.*)Wind Reports}s || die; #Parentheses indicate capture group
my $hail = $1; # $1 contains whatever matched in the (.*) part of above regex
print "$hail\n";

d5e5 2010-07-02 19:56:44

Thanks that covers both problems nicely.

shinjuo 2010-07-02 20:02:07

Answer 2

+1 A:

Parenthesis capture strings in regular expressions. You have no parenthesis in your regex, so $1 is not set to anything. If you had:

$html =~ m{(Hail Reports)} || die;

Then $1 would be set to "Hail Reports" if it exists in the $html variable. Since you only want to know if it matched, then you really don't need to capture anything at this point and you could write something like:

unless ( $html =~ /Hail Reports/ ) {
  die "No Hail Reports in HTML";
}

To capture something between the strings you can do something like:

if ( $html =~ /(?<=Hail Reports)(.*?)(?=Wind Reports)/s ) {
  print "Got $1\n";
}

runrig 2010-07-02 19:57:06

you need the 's' modifier on the regex to match across newlines, ie =~ /.../s

c-urchin 2010-07-02 20:02:50

Thanks. updated.

runrig 2010-07-02 20:05:25

Answer 3

+3 A:

The uninitialized-value warning is coming from $1 -- it's not defined or set anywhere.

For a line-level instead of byte-level "between" you could use:

for (split(/\n/, $html)) {
    print if (/Hail Reports/ .. /Wind Reports/ and !/(?:Hail|Wind) Reports/);
}

Jim Davis 2010-07-02 20:03:32

Answer 4

+3 A:

Makes use of single and multi-line matches. Plus it only picks up the first match for the between text, which will be a little faster than being greedy.

#!/usr/bin/perl -w

use strict;
use LWP::Simple;

   sub main{
      my $html = get("http://www.spc.noaa.gov/climo/reports/last3hours.html")
                 or die "Could not fetch NWS page.";

      # match single and multiple lines + not greedy
      my ($hail, $between, $wind) = $html =~ m/(Hail Reports)(.*?)(Wind Reports)/sm
                 or die "No Hail/Wind Reports";

      print qq{
               Hail:         $hail
               Wind:         $wind
               Between Text: $between
            };
   }

   main();

Armando 2010-07-03 00:48:14

ansaurus

tags:

views:

answers:

A little help with perl HTML parsing

related questions