I need a regular expression for validating the website URL using Perl.
+2
A:
use Regexp::Common qw /URI/;
while (<>) {
/($RE{URI}{HTTP})/ and print "$1 is an HTTP URI.\n";
}
singingfish
2010-04-08 10:44:05
Probably "is an HTTP URI" is a better example to show.
ysth
2010-04-08 11:01:35
+7
A:
I don't use regular expressions. I try to create a URI object and see what happens. If it works, I have a URI object that I can query to get the scheme (the other things get turned into "schemeless" URIs).
use URI;
while( <DATA> )
{
chomp;
my $uri = URI->new( $_, 'http' );
if( $uri->scheme ) { print "$uri is a URL\n"; }
else { print "$uri is not a URL\n"; }
}
__END__
foo.html
http://www.example.com/index.html
abc
www.example.com
If I'm looking for a specific sort of URI, I can query the object to see if it satisfies whatever I need, such as a particular domain name. If I'm doing something with URLs, I'm probably going to make an object anyway, so I might as well start with it.
brian d foy
2010-04-08 11:24:28
@brian, your script doesn't look quite right. I suppose without the http service identifier, www.example.com would still be a valid url but the script says the opposite.
Mike
2010-04-08 11:54:59
A host name is not a URL. Without a scheme, www.example.com could be a host name, or a file, or something else. There's no magic that distinguishes any of that stuff on its own. It's the URL that gives stuff context and meaning.
brian d foy
2010-04-08 12:11:33
Well, it seems Regexp::Common qw/URI/ does the same thing. But if someone should manually write down a lot of urls without http identifers, would those urls not be considered valid?
Mike
2010-04-08 12:11:57
@brian, I see the point. strict urls must include their service identifiers.
Mike
2010-04-08 12:13:31
The things you call "Service identifiers" are actually called "schemes", which is why I keep using that term.
brian d foy
2010-04-08 19:07:40
A:
try this:
if($url=~ /^((ht|f)tp(s?)\:\/\/)?([0-9a-zA-Z]+\.[0-9a-zA-Z]+)+$/)
or more strict(protocol necessary):
if($url=~ /^((ht|f)tp(s?)\:\/\/)([0-9a-zA-Z]+\.[0-9a-zA-Z]+)+$/)
Ehsan
2010-04-09 06:58:59