ansaurus

Question

How to verify a given string is a real URL, in PHP?

Answer 1

+6 A:

Don't fetch the whole contents - that could be enormous. Issue a HEAD request instead.

You could do some validation first, of course - remove things which are invalid as URLs, rather than just URLs which aren't currently served by anything. After that, issuing a HEAD request is about as good as it gets. Having said that, it becomes a grey area... what about a URL which returns "authorization required"? It could be a password protected directory, but if you knew the password you'd then get back a 404 because the file itself doesn't exist...

Jon Skeet 2009-08-19 20:43:47

Answer 2

A:

$host != gethostbyname($host)

for checking the host.

Zed 2009-08-19 20:44:10

Answer 3

+4 A:

This article outlines how to perform a DNS request from php. That might be the fastest option, although it would not tell you anything like if the server is online, file is found, etc. But it would tell you that the url is registered to an IP. It's up to you whether that would fit your definition of "valid"

Chris Thompson 2009-08-19 20:45:40

Answer 4

+1 A:

You don't mean a URL, you mean a Domain Name

ראובן 2009-08-19 21:07:22

Answer 5

A:

I would strongly suggest using CURL but just the headers without fetching any contents.

Here is the function, I use to verify if the given URL is valid and found.

function __checkUrl($url)
{
    //First checking with pattern whether it is proper or not
    $pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&amp;?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
    if (preg_match($pattern, $url))
    {
        $ch = curl_init();

        // set URL and other appropriate options
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_MAXREDIRS, 3);
        curl_setopt($ch, CURLOPT_NOBODY, true);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_FORBID_REUSE, true);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 4);
        curl_setopt($ch, CURLOPT_TIMEOUT, 4);

        // grab URL
        $output = curl_exec($ch);
        // Get response code
        $response_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        $newurl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);

        // Not found?
        if ($response_code == '404') {
            return false;
        } else {
            return $newurl;
        }
    }
    else
    {
        return false;
    }
}

With this function, I first check the URL is actually valid with Regex. After that curl that. By setting CURLOPT_FOLLOWLOCATION to true, we are taking care of 301 and similar redirects, but limit the no. of redirections to 3. And we finally we return Effective URL after all the redirections.

Hope this helps.

Thanashyam 2009-08-19 21:48:28

You know that URL validation regex is quite bogus, right? (As hinted in the OP's question.)

bobince 2009-08-19 22:09:03

Is this the "head" request described in the first answer, or you fetch here the entire page?

Itay Moav 2009-08-20 02:59:48

@Itay Moav: curl_setopt($ch, CURLOPT_NOBODY, true); -- causes curl to send a HEAD request.

GZipp 2009-08-20 12:33:44

ansaurus

tags:

views:

answers:

How to verify a given string is a real URL, in PHP?

related questions