views:

82

answers:

5

Hi. Can you please help me to make perl regexp to replace (http://.+) to http://www.my1.com/redir?$1 but do nothing for urls like http://www.my1.com/ or http://my1.com/

For instance I need to replace http://whole.url.site.com/foo.htm to http://www.my1.com/redir?http://whole.url.site.com/foo.htm http://www.google.com to http://www.my1.com/redir?http://www.google.com but leave http://www.my1.com/index.php untached.

Thanks a lot!

+3  A: 
s{http://www\.nop1\.com/}{http://www.my1.com/redir?http://www.nop1.com}g

Meets your requirements as stated.

If your requirements are a little bit different, you'll need to explain exactly what you want.

Also, I'm not sure what this has to do with negative lookahead.

EDIT: With the reformulated question, here we go:

s{^http://(?!(?:www\.)?my1\.com)(.+)}{http://www.my1.com/redir?$1}g

(tweaked it a little)

Anon.
I think I might have put a / or end of string anchor after .com so you don't break when the number of TLDs explode. :)
brian d foy
That's probably a good idea, yes. Especially considering that as-written, this would *not* replace the perfectly-valid-URL http://www.my1.com.au
Anon.
+2  A: 

You may be wanting to capture the sitename of the URL, if so try this:

 s{http://www\.(.*?)\.com/}{http://www.my1.com/redir?http://www.$1.com}g
harschware
PS Thanks to @Anon. I just modified his answer...
harschware
NOTE: OP has edited problem statement after I posted answer...
harschware
this will also rewrite http://www.my1.com/ to http://www.my1.com/redir?http://www.my1.com but the question is how to avoid this for domain www.my1.com
Dmytro Leonenko
Ah, right you are ...
harschware
+2  A: 
$foo='http://www.foo.com/';
$foo =~ s#^(http://(?!(?:www\.)?my1\.com/).+)$#http://www.my1.com/redir?$1#;
print $foo;

Result:

http://www.my1.com/redir?http://www.foo.com/
Mark Byers
This one fails for http://www.my1.com with no / at the end. Not that I think that you should write URLs like that, but people do.
brian d foy
+1  A: 
s|(http://www\.(?!my1\.)(.*)\.com)|http://www.my1.com/redir?$1|i;

This matches any www.*.com website that isn't www.my1.com and puts it in the redirect.

Jeff B
Why limit it to hosts that start with www and end with .com? There's a big universe of host names out there. :)
brian d foy
I know. I was answering the question literally, as the focus of the question seemed to be about negative lookaheads. In fact the title used to be, basically, "how do I use a negative lookaheads in this URL regex."
Jeff B
+6  A: 

If you are doing this inside a Perl script, don't use regular expressions. It's a mess to read them in this case, and so far every regex answer is broken since it doesn't URI escape the stuff that you want to put into the query string.

Instead of trying to parse a URI yourself, let the time-tested URI module handle all the edge cases for you. The URI::Escape module helps you make the query string so you don't get zapped by odd characters in URLs:

#!perl

use URI;
use URI::Escape;

while( <DATA> )
    {
    chomp;

    my $url = URI->new( $_ );

    if( $url->host =~ /(^|\.)my1\.com$/ ) {
        print "$url\n";
        }
    else {
        my $query_string = uri_escape($url->as_string);
        print "http://www.my1.com/redir?$query_string\n";
        }
    }

__DATA__
http://whole.url.site.com/foo.htm
http://www.google.com
http://www.google.com/search?q=perl+uri
http://www.my1.com/index.php
http://my1.com/index.php
http://moremy1.com/index.php
brian d foy
Hi. Thanks for pointing. BTW I have to deal with database and large blocks of text where I need to replace URIs. So anyway I have to use regexps for that particular reason
Dmytro Leonenko
You don't have to use regexes reformat them. URI::Find can find them in text and uses a callback to replace what it finds.
brian d foy