ansaurus

Question

How can I make WWW:Mechanize to not fetch pages twice?

Answer 1

+3 A:

You can subclass WWW::Mechanize and redefine the get() method to do what you want:

package MyMech;
use base 'WWW::Mechanize';

sub get {
    my $self = shift;
    my($url) = @_;

    if (defined $self->res && $self->res->request->uri ne $url) {
        return $self->SUPER::get(@_)
    }
    return $self->res;
}

eugene y 2010-03-25 12:14:48

if get() has not been called $self->res is undefined and this throws 'Can't call method "request" on an undefined value' on the first get. Change the 4th line of sub get to if ( !$self->res || $self->res->request->uri ne $url) {to allow get to be called.

MkV 2010-03-26 07:25:40

@james2vegas: Surely.

eugene y 2010-03-26 09:27:02

Answer 2

+3 A:

You can store the URLs and their content in a hash.

my $mech = WWW::Mechanize->new();
my $url = 'http://google.com';
my %response;

$response{$url} = $mech->get($url) unless $response{$url};

rarbox 2010-03-25 12:30:25

Answer 3

+6 A:

See WWW::Mechanize::Cached:

Synopsis

use WWW::Mechanize::Cached;

my $cacher = WWW::Mechanize::Cached->new;
$cacher->get( $url );

Description

Uses the Cache::Cache hierarchy to implement a caching Mech. This lets one perform repeated requests without hammering a server impolitely.

Sinan Ünür 2010-03-25 14:30:00

ansaurus

tags:

views:

answers:

How can I make WWW:Mechanize to not fetch pages twice?

Synopsis

Description

related questions