views:

37

answers:

2

I have a Perl script which downloads a large number of files from a remote server. I'd like to avoid hammering the server, so I'd like to avoid downloading a file if it hasn't been modified since my last check. Is there a good way to do this, either in Perl or with a shell script?

Can I get the server to send HTTP 304 rather than HTTP 200 for unmodified files?

+3  A: 

Yes, use LWP::UserAgent and pay special attention to the mirror method. This is also available in the procedural LWP::Simple as the mirror function.

From LWP's POD:

This method will get the document identified by $url and store it in file called $filename. If the file already exists, then the request will contain an "If-Modified-Since" header matching the modification time of the file. If the document on the server has not changed since this time, then nothing happens. If the document has been updated, it will be downloaded again. The modification time of the file will be forced to match that of the server.

The return value is the the response object.

HTTP 304 is the response code the server will return if you pass the If-Modified-Since test and your copy is fresh. LWP does this internally with mirror -- you needn't worry about it.

Evan Carroll
Thanks. Is there a way to have it send an If-Modified-Since without actually having the file? I download files, check if they're correct, and take diagnostics if not. Otherwise, I just delete the file and continue.
Charles
Yes. There is. However, with LWP, that would require you to use [HTTP::Request](http://search.cpan.org/~gaas/libwww-perl-5.836/lib/HTTP/Request.pm) and `$ua->request()` directly. Check out the source of [LWP::UserAgent](http://cpansearch.perl.org/src/GAAS/libwww-perl-5.836/lib/LWP/UserAgent.pm) read the mirror sub defn. You can also subclass `LWP::UserAgent` and override mirror if you want this to happen transparently, subclassing `LWP::UserAgent` really permits you to do some really awesome stuff.
Evan Carroll
I'm glad to see that something existed. I was prepared to code up the socket interfaces if that's what this took, but it looks like I won't have to do much more than write a bit of HTTP and a snippet of Perl. Thanks for all the advice! I hadn't heard of LWP before... I guess I need to spend more time on CPAN.
Charles
+1  A: 

This is based on Evan Carrol's answer, but I'm going to elaborate in case this is useful for someone else. I stubbed out the response section; I doubt that part of my code will be interesting.

#!/usr/bin/perl -w

require HTTP::Date;
require LWP::UserAgent;
require Date::Parse;

my $lastChecked = '2009-01-01';
my $ua = LWP::UserAgent->new;
$ua->default_header('If-Modified-Since' => HTTP::Date::time2str(Date::Parse::str2time($lastChecked)));

my $response = $ua->get('http://example.com/');

if ($response->code == 304) {
    print "No changes.\n";
} elsif ($response->is_success) {
    print $response->decoded_content;
} else {
    print "Response was error " . $response->code . ": '" . $response->status_line . "'\n";
}
Charles