I've been looking for a simple regex for URL's, does anybody have one handy that works well? I didn't find one with the zend framework validation classes and have seen several implementations.
Thanks
I've been looking for a simple regex for URL's, does anybody have one handy that works well? I didn't find one with the zend framework validation classes and have seen several implementations.
Thanks
i used this on a few projects, i don't believe i've run into issues, but i'm sure it's not exhaustive:
$text = preg_replace("
#((http|https|ftp)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|\"|'|:|\<|$|\.\s)#ie",
"'<a href=\"$1\" target=\"_blank\">$3</a>$4'",
$text
);
most of the random junk at the end is to deal with situations like http://domain.com.
in a sentance (to avoid matching the trailing period). i'm sure it could be cleaned up but since it worked I've more or less just copied it over from project to project.
I've used this one with good success - I don't remember where I got it from
$pattern = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. -- jwz
Who says you need to use a regex? If you're trying to validate if a string is a URL, then use the parse_url function in PHP.
Galen is right, filter_var() function is the best way to validate whether a string is URL or not.
var_dump(filter_var('example.com', FILTER_VALIDATE_URL));
It's a bad practice to use regular expressions where is's not necessary.
As per the PHP manual - parse_url should not be used to validate a URL.
Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL)
does not perform any better.
Both parse_url()
and filter_var()
will pass malformed URLs such as http://...
Therefore in this case - regex is the better method.
Edit:
As incidence pointed out this code has been DEPRECATED with the release of PHP 5.3.0 (2009-06-30) and should be used accordingly.
Just my two cents but I've developed this function and have been using it for a while with success. It's well documented and separated so you can easily change it.
// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
if($url==NULL) return false;
$protocol = '(http://|https://)';
$allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';
$regex = "^". $protocol . // must include the protocol
'(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars
'[a-z]' . '{2,6}'; // followed by a TLD
if(eregi($regex, $url)==true) return true;
else return false;
}
Peter's Regex doesn't look right to me for many reasons. It allows all kinds of special characters in the domain name and doesn't test for much.
Frankie's function looks good to me and you can build a good regex from the components if you don't want a function, like so:
^(http://|https://)(([a-z0-9]([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6}
Untested but I think that should work.
Also, Owen's answer doesn't look 100% either. I took the domain part of the regex and tested it on a Regex tester tool http://erik.eae.net/playground/regexp/regexp.html
I put the following line:
(\S*?\.\S*?)
in the "regexp" section and the following line:
-hello.com
under the "sample text" section.
The result allowed the minus character through. Because \S means any non-space character.
Note the regex from Frankie handles the minus because it has this part for the first character:
[a-z0-9]
Which won't allow the minus or any other special character.