I have a wget
-like script which downloads a page and then retrieves all the files linked in IMG tags on that page.
Given the URL of the original page and the the link extracted from the IMG tag in that page I need to build the URL for the image file I want to retrieve. Currently I use a function I wrote:
sub build_url {
my ( $base, $path ) = @_;
# if the path is absolute just prepend the domain to it
if ($path =~ /^\//) {
($base) = $base =~ /^(?:http:\/\/)?(\w+(?:\.\w+)+)/;
return "$base$path";
}
my @base = split '/', $base;
my @path = split '/', $path;
# remove a trailing filename
pop @base if $base =~ /[[:alnum:]]+\/[\w\d]+\.[\w]+$/;
# check for relative paths
my $relcount = $path =~ /(\.\.\/)/g;
while ( $relcount-- ) {
pop @base;
shift @path;
}
return join '/', @base, @path;
}
The thing is, I'm surely not the first person solving this problem, and in fact it's such a general problem that I assume there must be some better, more standard way of dealing with it, using either a core module or something from CPAN - although via a core module is preferable. I was thinking about File::Spec
but wasn't sure if it has all the functionality I would need.