tags:

views:

403

answers:

4

How can I remove a substring from a string using Perl? For example, $URL contains http://xyz.com/Main#abcd.aspx

And I want to check and strip out 'Main#' from $URL Can anyone help me out?

Well first I need to check that whether the string Main# exist or not. If it exists, then strip it; otherwise nothing needs to be done. So only an if statement.

+4  A: 
$URL =~ s/Main#//;

Which is a no-op if "Main#" isn't present.

Dave W. Smith
+4  A: 

.

use strict;
use warnings;
use URI::Split qw( uri_split uri_join ); 

my $str = "http://xyz.com/Main#abc.aspx"
my ($scheme, $auth, $path, $query, $frag)  = uri_split( $str );

That will give you the URI as a series of tokens, but beyond that, the specifics of what you want to do are a bit unclear.

  1. Are you trying to extract the Path so you can use it?
  2. Are you trying to recompose the URI without a path?
  3. Are you trying to extract only a specific node in the path?
  4. Are you trying to recompose the URI without a specific node in the path
  5. Are you trying to filter out only the literal string 'Main' , not anything else?

Well first i need to check that whether the string #Main exist or not, if it exist then strip it otherwise nothing to be done, so only an if statement

if( $str  =~ /#Main/ ){
   $str =~ s/#Main//g;
}

This will remove the literal string '#Main' from anywhere in the url if it exists. This could also just be written as

$str =~ s/#Main//g;

Because if it doesn't exist, no replacements will be done.

Notable Complications

If you are trying to retrieve a URI from a web-client, as in, it is a request string, you'll likely find the #.* part, also known as the document fragment, is already removed from the URI when you get it. This is how in my experience web-clients behave.

I'm pretty sure there's an RFC somewhere specifying this to behave like this, but lazyness--

Kent Fredric
A: 
$URL =~ s/Main#//;

will strip out the first instance of Main#. Adding g after the last / will make it strip out all instances. Stripping out the last instance is less trivial; here are a couple of ways:

$URL = reverse($URL);
$URL =~ s/#niaM//;
$URL = reverse($URL);

or

$URL =~ s/^(.*)Main#/$1/;

or

my $index = rindex( $URL, 'Main#' );
if ($index >= 0) { substr( $URL, $index, 5, '' ) }

If you want to do more complex things (like strip out "com" everywhere except in the hostname) you may want to parse the URI with the URI or URI::Split modules.

ysth
A: 

'perldoc perlop' -- look in the s/// section
'perldoc perlre'' -- read the entire document
http://oreilly.com/catalog/9780596001322/

Ether