ansaurus

Question

Answer 1

+1 A:

i used this on a few projects, i don't believe i've run into issues, but i'm sure it's not exhaustive:

$text = preg_replace("
  #((http|https|ftp)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|\"|'|:|\<|$|\.\s)#ie",
  "'<a href=\"$1\" target=\"_blank\">$3</a>$4'",
  $text
);

most of the random junk at the end is to deal with situations like http://domain.com. in a sentance (to avoid matching the trailing period). i'm sure it could be cleaned up but since it worked I've more or less just copied it over from project to project.

Owen 2008-10-15 19:30:51

This has been downvoted... can anyone explain why?

alex 2009-05-27 03:30:22

Some things that jump out at me: use of alternation where character classes are called for (every alternative matches exactly one character); and the replacement shouldn't have needed the outer double-quotes (they were only needed because of the pointless /e modifier on the regex).

Alan Moore 2009-05-30 05:53:50

Solution does not for the simple case of 'google.com' although it could be argued that 'google.com' is not a valid URL.

John Scipione 2009-11-11 22:27:26

@John Scipione: `google.com` is only a valid relative URL path but not a valid absolute URL. And I think that’s what he’s looking for.

Gumbo 2010-01-04 08:30:57

Answer 2

A:

I've used this one with good success - I don't remember where I got it from

$pattern = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";

Peter Bailey 2008-10-15 19:36:07

^(http://|https://)?(([a-z0-9]?([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6} (may be too greedy, not sure yet, but it's more flexible on protocol and leading www)

andrewbadera 2009-08-26 15:54:55

Answer 3

A:

There is one here.

Milen A. Radev 2008-10-15 19:37:50

Answer 4

+1 A:

there's also

http://www.php.net/filter

Galen 2008-10-15 20:49:37

Answer 5

A:

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. -- jwz

Who says you need to use a regex? If you're trying to validate if a string is a URL, then use the parse_url function in PHP.

Andy Lester 2008-10-16 02:35:59

Answer 6

+39 A:

Galen is right, filter_var() function is the best way to validate whether a string is URL or not.

var_dump(filter_var('example.com', FILTER_VALIDATE_URL));

It's a bad practice to use regular expressions where is's not necessary.

Stanislav 2008-10-16 06:55:36

this is definitely a great alternative, unfortunately it's php 5.2+ (unless you install the PECL version)

Owen 2008-10-19 08:07:13

filter_var only works in PHP >= 5.2.0

John Scipione 2009-11-11 22:24:29

There's a bug in 5.2.13 (and I think 5.3.2) that prevents urls with dashes in them from validating using this method.

vamin 2010-06-01 23:27:41

filter_var will reject http://test-site.com, I have domain names with dashes, wheter they are valid or not. I don't think filter_var is the best way to validate a url. It will allow a url like `http://www`

Cesar 2010-09-06 19:30:19

> It will allow a url like 'http://www'It is OK when URL like 'http://localhost'

Stanislav 2010-09-07 10:34:30

Answer 7

+6 A:

As per the PHP manual - parse_url should not be used to validate a URL.

Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL) does not perform any better.

Both parse_url() and filter_var() will pass malformed URLs such as http://...

Therefore in this case - regex is the better method.

catchdave 2008-12-27 14:12:29

This argument doesn't follow. If FILTER_VALIDATE_URL is a little more permissive than you want, tack on some additional checks to deal with those edge cases. Reinventing the wheel with your own attempt at a regex against urls is only going to get you further from a complete check.

Tchalvak 2010-07-19 00:50:59

See all the shot-down regexes on this page for examples of why -not- to write your own.

Tchalvak 2010-07-19 02:54:06

You make a fair point Tchalvak. Regexes for something like URLs can (as per other responses) be very hard to get right.Regex is not always the answer. Conversely regex is also not always the wrong answer either.The important point is to pick the right tool (regex or otherwise) for the job and not be specifically "anti" or "pro" regex.In hindsight, your answer of using filter_var in combination with constraints on its edge-cases, looks like the better answer (particularly when regex answers start to get to greater than 100 chars or so - making maintenance of said regex a nightmare)

catchdave 2010-07-20 04:54:50

Answer 8

+1 A:

Edit:
As incidence pointed out this code has been DEPRECATED with the release of PHP 5.3.0 (2009-06-30) and should be used accordingly.

Just my two cents but I've developed this function and have been using it for a while with success. It's well documented and separated so you can easily change it.

// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
 if($url==NULL) return false;

 $protocol = '(http://|https://)';
 $allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';

 $regex = "^". $protocol . // must include the protocol
    '(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars
    '[a-z]' . '{2,6}'; // followed by a TLD
 if(eregi($regex, $url)==true) return true;
 else return false;
}

Frankie 2009-03-12 17:17:07

Eregi will be removed in PHP 6.0.0. And domains with "öäåø" will not validate with your function. You probably should convert the URL to punycode first?

incidence 2009-12-10 15:48:03

@incidence absolutely agree. I wrote this in March and PHP 5.3 only came out late June setting eregi as DEPRECATED. Thank you. Gonna edit and update.

Frankie 2009-12-10 18:05:05

Answer 9

A:

Peter's Regex doesn't look right to me for many reasons. It allows all kinds of special characters in the domain name and doesn't test for much.

Frankie's function looks good to me and you can build a good regex from the components if you don't want a function, like so:

^(http://|https://)(([a-z0-9]([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6}

Untested but I think that should work.

Also, Owen's answer doesn't look 100% either. I took the domain part of the regex and tested it on a Regex tester tool http://erik.eae.net/playground/regexp/regexp.html

I put the following line:

(\S*?\.\S*?)

in the "regexp" section and the following line:

-hello.com

under the "sample text" section.

The result allowed the minus character through. Because \S means any non-space character.

Note the regex from Frankie handles the minus because it has this part for the first character:

[a-z0-9]

Which won't allow the minus or any other special character.

joedevon 2009-05-30 05:11:48

ansaurus

tags:

views:

answers:

PHP validation/regex for URL

related questions