The trick is figuring out how you want to pretty-print each sort of URL, so in that case you need to tell your script what to do in each case:
use URI;
while( <DATA> ) {
chomp;
my $uri = URI->new( $_ );
my $s = $uri->scheme;
my $rest = do {
if( $s =~ /(?:https?|ftp)/ ) {
$uri->host . $uri->path_query
}
elsif( $s eq 'mailto' ) {
$uri->path
}
elsif( ! $s ) {
$uri
}
};
print "$uri -> $rest\n";
}
__END__
http://www.example.com/foo/bar.html
www.example.com/foo/bar.html
ftp://www.example.com
mailto:[email protected]
https://www.example.com/foo?a=b;c=d
http://joe:[email protected]/login
This produces:
http://www.example.com/foo/bar.html -> www.example.com/foo/bar.html
www.example.com/foo/bar.html -> www.example.com/foo/bar.html
ftp://www.example.com -> www.example.com
mailto:[email protected] -> [email protected]
https://www.example.com/foo?a=b;c=d -> www.example.com/foo?a=b;c=d
http://joe:[email protected]/login -> www.example.com/login
If you want something different for a particular URL, you just need to make a branch for it and put together the parts that you want. Notice the URI
also handles schemeless URIs.
If you don't want long URI strings for your pretty printing, you might throw in something like this to cut off the string after so many characters:
substr( $rest, 20 ) = '...' if length $rest > 20;
Here's a solution with given
, which is slightly cleaner, but also slightly uglier. This is the Perl 5.010 version:
use 5.010;
use URI;
while( <DATA> ) {
chomp;
my $uri = URI->new( $_ );
my $r;
given( $uri->scheme ) {
when( /(?:https?|ftp)/ ) { $r = $uri->host . $uri->path_query }
when( 'mailto' ) { $r = $uri->path }
default { $r = $uri }
}
print "$uri -> $r\n";
}
It's uglier because I have to repeat that assignment to $r
. Perl 5.14 is going to fix that though be letting given
have a return value. Since that stable version isn't available yet, you have to use the experimental 5.13 track:
use 5.013004;
use URI;
while( <DATA> ) {
chomp;
my $uri = URI->new( $_ );
my $r = do {
given( $uri->scheme ) {
when( /(?:https?|ftp)/ ) { $uri->host . $uri->path_query }
when( 'mailto' ) { $uri->path }
default { $uri }
}
};
print "$uri -> $r\n";
}