views:

1006

answers:

4

I am fetching some pages over the Web using Perl's LWP::UserAgent and would like to be as polite as possible. By default, LWP::UserAgent does not seamlessly handle compressed content via gzip. Is there an easy way to make it do so, to save everyone some bandwidth?

+8  A: 

LWP has this capability built in, thanks to HTTP::Message. But it's a bit hidden.

First make sure you have Compress::Zlib installed so you can handle gzip. HTTP::Message::decodable() will output a list of allowed encodings based on the modules you have installed; in scalar context, this output takes the form a comma-delineated string that you can use with the 'Accept-Encoding' HTTP header, which LWP requires you to add to your HTTP::Request-s yourself. (On my system, with Compress::Zlib installed, the list is "gzip, x-gzip, deflate".)

When your HTTP::Response comes back, be sure to access the content with $response->decoded_content instead of $response->content.

In LWP::UserAgent, it all comes together like this:

my $ua = LWP::UserAgent->new;
my $can_accept = HTTP::Message::decodable;
my $response = $ua->get('http://stackoverflow.com/feeds', 
    'Accept-Encoding' => $can_accept,
);
print $response->decoded_content;

This will also decode text to Perl's unicode strings. If you only want LWP to uncompress the response, and not mess with the text, do like so:

print $response->decoded_content(charset => 'none');
Ryan Tate
Note: This works with LWP 5.814 (from July 08) or newer.
Ryan Tate
A: 

why, when I run this code and add line: print $can_accept I get HTTP::Message::decodable instead of list of available compression methods as said here: http://search.cpan.org/~gaas/libwww-perl-5.833/lib/HTTP/Message.pm ?

marian
What is your exact code? The above code is tested and works fine for me.
Ryan Tate
If it's printing the name of the module, you either didn't load the module or you have a typo in the function call. You should be using 'use strict' in all your scripts to flag things like this -- it would have coughed up the error 'Bareword "HTTP::Message::decodable" not allowed' instead of just printing it.You need to have either 'use LWP::UserAgent;' or 'use HTTP::Message;' at the top of your script for my code to work.
Ryan Tate
A: 

my code is:

#!/usr/bin/perl -w
use strict;
use LWP::UserAgent;
use HTTP::Message;

my $ua = LWP::UserAgent->new;
my $can_accept = HTTP::Message::decodable;
my $response = $ua->get('http://www.proniko.pl/',
    'Accept-Encoding' => $can_accept,
    );
    print $response->decoded_content;

print scalar $can_accept;

when i run the code, I get this error message:

$ perl -w test.pl
Bareword "HTTP::Message::decodable" not allowed while "strict subs" in use at test.pl line 7.
Execution of test.pl aborted due to compilation errors.

I am pretty sure that I am missing some Perl module because this code used to work for me as expected.

marian
Hmm, which version HTTP::Message do you have? You can find out by adding "print $HTTP::Message::VERSION;" to your script. It looks like the decodable function was added "2008-07-25 Release 5.814".
Ryan Tate
A: 

HTTP::Messave::VWRSION = 1.56 Perl version is v5.8.8 built for x86_64-linux

marian