ansaurus

Question

How to match bare urls with regex in PHP?

Answer 1

+2 A:

It appears that you're trying to parse HTML using regular expressions. You might want to rethink that.

Nick Bastin 2010-09-25 08:15:36

how is matching a url in a string parsing html?

grapefrukt 2010-09-25 08:46:17

You're matching the URL within an HTML context. Load the HTML into a DOMDocument and then test each text node against your pattern.

Justin Johnson 2010-09-25 08:50:04

I don't see how that linked answer can solve my question,though..

wamp 2010-09-25 09:20:08

@wamp: If you're specifically trying to avoid a greedy algorithm that eats HTML tags, that must mean you're in a position (at least sometimes) where your link will be embedded in HTML. And that way lies madness.

Nick Bastin 2010-09-25 18:59:33

Answer 2

A:

try this...

function validUrl($url){
        $return=FALSE;
        $matches=FALSE;
        $regex='#(^';                  #match[1]
        $regex.='((https?|ftps?)+://)?'; #Scheme match[2]
        $regex.='(([0-9a-z-]+\.)+'; #Domain match[5] complete match[4]
        $regex.='([a-z]{2,3}|aero|coop|jobs|mobi|museum|name|travel))'; #TLD match[6]
        $regex.='(:[0-9]{1,5})?'; #Port match[7]
        $regex.='(\/[^ ]*)?'; #Query match[8]
        $regex.='$)#i';
        if( preg_match($regex,$url,$matches) ){
            $return=$matches[0]; $domain=$matches[4];
            if(!gethostbyname($domain)){ 
                $return = FALSE;
            }
        }
        if($return==FALSE){
            return FALSE;
        }
        else{
            return $matches;
        }
    }

jatt 2010-09-25 08:16:49

I've updated the question to make it clear.

wamp 2010-09-25 08:18:59

@jatt: And how does a more complex regex help in this case? Read the question again.

Tomalak 2010-09-25 08:21:01

And in any case, trying to enumerate “valid” TLDs is an exercise in futility.

bobince 2010-09-25 08:55:03

Answer 3

A:

RE

http:\/\/[a-zA-Z0-9\.\-]*

Result

Array
(
    [0] => http://google.com
)

articlestack 2010-09-25 14:13:02

Answer 4

A:

More effective RE

[hf]t{1,2}p:\/\/[a-zA-Z0-9\.\-]*

Result

Array
(
    [0] => Array
        (
            [0] => ftp://article-stack.com
            [1] => http://google.com
        )
)

articlestack 2010-09-25 14:16:59

ansaurus

tags:

views:

answers:

How to match bare urls with regex in PHP?

related questions