ansaurus

Question

How can I rewrite URLs except those of a particular domain?

Answer 1

+3 A:

s{http://www\.nop1\.com/}{http://www.my1.com/redir?http://www.nop1.com}g

Meets your requirements as stated.

If your requirements are a little bit different, you'll need to explain exactly what you want.

Also, I'm not sure what this has to do with negative lookahead.

EDIT: With the reformulated question, here we go:

s{^http://(?!(?:www\.)?my1\.com)(.+)}{http://www.my1.com/redir?$1}g

(tweaked it a little)

Anon. 2010-01-26 21:33:00

I think I might have put a / or end of string anchor after .com so you don't break when the number of TLDs explode. :)

brian d foy 2010-01-27 03:06:59

That's probably a good idea, yes. Especially considering that as-written, this would *not* replace the perfectly-valid-URL http://www.my1.com.au

Anon. 2010-01-27 03:13:51

Answer 2

+2 A:

You may be wanting to capture the sitename of the URL, if so try this:

 s{http://www\.(.*?)\.com/}{http://www.my1.com/redir?http://www.$1.com}g

harschware 2010-01-26 21:35:12

PS Thanks to @Anon. I just modified his answer...

harschware 2010-01-26 21:36:08

NOTE: OP has edited problem statement after I posted answer...

harschware 2010-01-26 21:41:04

this will also rewrite http://www.my1.com/ to http://www.my1.com/redir?http://www.my1.com but the question is how to avoid this for domain www.my1.com

Dmytro Leonenko 2010-01-26 21:42:08

Ah, right you are ...

harschware 2010-01-26 22:07:31

Answer 3

+2 A:

$foo='http://www.foo.com/';
$foo =~ s#^(http://(?!(?:www\.)?my1\.com/).+)$#http://www.my1.com/redir?$1#;
print $foo;

Result:

http://www.my1.com/redir?http://www.foo.com/

Mark Byers 2010-01-26 21:35:58

This one fails for http://www.my1.com with no / at the end. Not that I think that you should write URLs like that, but people do.

brian d foy 2010-01-27 03:08:42

Answer 4

+1 A:

s|(http://www\.(?!my1\.)(.*)\.com)|http://www.my1.com/redir?$1|i;

This matches any www.*.com website that isn't www.my1.com and puts it in the redirect.

Jeff B 2010-01-26 21:41:01

Why limit it to hosts that start with www and end with .com? There's a big universe of host names out there. :)

brian d foy 2010-01-27 03:05:18

I know. I was answering the question literally, as the focus of the question seemed to be about negative lookaheads. In fact the title used to be, basically, "how do I use a negative lookaheads in this URL regex."

Jeff B 2010-01-27 05:27:10

Answer 5

+6 A:

If you are doing this inside a Perl script, don't use regular expressions. It's a mess to read them in this case, and so far every regex answer is broken since it doesn't URI escape the stuff that you want to put into the query string.

Instead of trying to parse a URI yourself, let the time-tested URI module handle all the edge cases for you. The URI::Escape module helps you make the query string so you don't get zapped by odd characters in URLs:

#!perl

use URI;
use URI::Escape;

while( <DATA> )
    {
    chomp;

    my $url = URI->new( $_ );

    if( $url->host =~ /(^|\.)my1\.com$/ ) {
        print "$url\n";
        }
    else {
        my $query_string = uri_escape($url->as_string);
        print "http://www.my1.com/redir?$query_string\n";
        }
    }

__DATA__
http://whole.url.site.com/foo.htm
http://www.google.com
http://www.google.com/search?q=perl+uri
http://www.my1.com/index.php
http://my1.com/index.php
http://moremy1.com/index.php

brian d foy 2010-01-26 22:34:21

Hi. Thanks for pointing. BTW I have to deal with database and large blocks of text where I need to replace URIs. So anyway I have to use regexps for that particular reason

Dmytro Leonenko 2010-01-27 07:00:54

You don't have to use regexes reformat them. URI::Find can find them in text and uses a callback to replace what it finds.

brian d foy 2010-01-27 07:15:00

ansaurus

tags:

views:

answers:

How can I rewrite URLs except those of a particular domain?

related questions