tags:

views:

34

answers:

1

I'm new to using the Perl treebuilder module for HTML parsing and can't figure out what the issue is with this.. I have spent a few hours trying to get this to work and looked at a few tutorials but am still getting this error: "Use of uninitialized value in pattern match ", referring to this line in my code:

sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
        );

This error prints out many times in the terminal, I have checked everything over and over and its definitely getting the input as the $downloaded page is a full HTML file that contains the string I give below... any advice is greatly appreciated.

sample string, contained within the $downloadedpage variable

        <div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"&gt;&lt;img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"&gt;&lt;b&gt;Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world&#8217;s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>

my code:

    my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );
+2  A: 

Using the exact code and example you gave,

use warnings;
use strict;
use HTML::TreeBuilder;
my $downloadedpage=<<EOF;
<div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"&gt;&lt;img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"&gt;&lt;b&gt;Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world&#8217;s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>
EOF

my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    my @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );

I don't get any errors at all. My first guess would be that there are some <div>s in the HTML which don't have a class attribute.

Maybe you need to write

sub{
     $_[0]-> tag() eq 'div' and 
     $_[0]->attr('class') and 
     ($_[0]->attr('class') =~ /snap_preview/)
}

there?

Kinopiko
yes, you are right, there are <div>s without a class attribute in the HTML doc.. so I guess thats what is causing the error... should it still be able to find my specific div, then? Or do I need to do something else to make it skip over these ones without a class?
Rick
Yes you need to test that. Please take a look at my edit.
Kinopiko
Great, thanks.. I will try that
Rick
Thanks.. that works, so I understand better now how the treebuilder is working
Rick