ansaurus

Question

How can I remove part of an URL using Perl?

Answer 1

+4 A:

$URL =~ s/Main#//;

Which is a no-op if "Main#" isn't present.

Dave W. Smith 2009-06-14 21:43:29

Answer 2

+4 A:

.

use strict;
use warnings;
use URI::Split qw( uri_split uri_join ); 

my $str = "http://xyz.com/Main#abc.aspx"
my ($scheme, $auth, $path, $query, $frag)  = uri_split( $str );

That will give you the URI as a series of tokens, but beyond that, the specifics of what you want to do are a bit unclear.

Are you trying to extract the Path so you can use it?
Are you trying to recompose the URI without a path?
Are you trying to extract only a specific node in the path?
Are you trying to recompose the URI without a specific node in the path
Are you trying to filter out only the literal string 'Main' , not anything else?

Well first i need to check that whether the string #Main exist or not, if it exist then strip it otherwise nothing to be done, so only an if statement

if( $str  =~ /#Main/ ){
   $str =~ s/#Main//g;
}

This will remove the literal string '#Main' from anywhere in the url if it exists. This could also just be written as

$str =~ s/#Main//g;

Because if it doesn't exist, no replacements will be done.

Notable Complications

If you are trying to retrieve a URI from a web-client, as in, it is a request string, you'll likely find the #.* part, also known as the document fragment, is already removed from the URI when you get it. This is how in my experience web-clients behave.

I'm pretty sure there's an RFC somewhere specifying this to behave like this, but lazyness--

Kent Fredric 2009-06-14 21:44:25

Answer 3

A:

$URL =~ s/Main#//;

will strip out the first instance of Main#. Adding g after the last / will make it strip out all instances. Stripping out the last instance is less trivial; here are a couple of ways:

$URL = reverse($URL);
$URL =~ s/#niaM//;
$URL = reverse($URL);

or

$URL =~ s/^(.*)Main#/$1/;

or

my $index = rindex( $URL, 'Main#' );
if ($index >= 0) { substr( $URL, $index, 5, '' ) }

If you want to do more complex things (like strip out "com" everywhere except in the hostname) you may want to parse the URI with the URI or URI::Split modules.

ysth 2009-06-14 21:48:00

Answer 4

A:

'perldoc perlop' -- look in the s/// section
'perldoc perlre'' -- read the entire document
http://oreilly.com/catalog/9780596001322/

Ether 2009-06-17 21:18:01

ansaurus

tags:

views:

answers:

How can I remove part of an URL using Perl?

Notable Complications

related questions