tags:

views:

128

answers:

4

I need a regular expression for validating the website URL using Perl.

+9  A: 

Regexp::Common::URI::http

Kinopiko
+2  A: 
 use Regexp::Common qw /URI/;
    while (<>) {
        /($RE{URI}{HTTP})/       and  print "$1 is an HTTP URI.\n";
    }
singingfish
Probably "is an HTTP URI" is a better example to show.
ysth
+7  A: 

I don't use regular expressions. I try to create a URI object and see what happens. If it works, I have a URI object that I can query to get the scheme (the other things get turned into "schemeless" URIs).

use URI;

while( <DATA> )
    {
    chomp;
    my $uri = URI->new( $_, 'http' );
    if( $uri->scheme ) { print "$uri is a URL\n"; }
    else               { print "$uri is not a URL\n"; }
    }

__END__
foo.html
http://www.example.com/index.html
abc
www.example.com

If I'm looking for a specific sort of URI, I can query the object to see if it satisfies whatever I need, such as a particular domain name. If I'm doing something with URLs, I'm probably going to make an object anyway, so I might as well start with it.

brian d foy
@brian, your script doesn't look quite right. I suppose without the http service identifier, www.example.com would still be a valid url but the script says the opposite.
Mike
A host name is not a URL. Without a scheme, www.example.com could be a host name, or a file, or something else. There's no magic that distinguishes any of that stuff on its own. It's the URL that gives stuff context and meaning.
brian d foy
Well, it seems Regexp::Common qw/URI/ does the same thing. But if someone should manually write down a lot of urls without http identifers, would those urls not be considered valid?
Mike
I think I just answered that.
brian d foy
@brian, I see the point. strict urls must include their service identifiers.
Mike
The things you call "Service identifiers" are actually called "schemes", which is why I keep using that term.
brian d foy
@brian, thanks for pointing this out
Mike
A: 

try this:

if($url=~ /^((ht|f)tp(s?)\:\/\/)?([0-9a-zA-Z]+\.[0-9a-zA-Z]+)+$/)

or more strict(protocol necessary):

if($url=~ /^((ht|f)tp(s?)\:\/\/)([0-9a-zA-Z]+\.[0-9a-zA-Z]+)+$/)
Ehsan
Downvote because that's just plain wrong.
daxim
this one is better
Ehsan