tags:

views:

202

answers:

8

I'm trying to test the validity of a url entered with php5. I thought of using regex, but assuming that it works correctly all the time, it only solves the problem of the url being syntactically valid. It doesn't tell me anything about the url being correct or working.

I'm trying to find another solution to do both if possible. Or is it better to find 2 separate solutions for this?

If a regex is the way to go, what tested regexes exist for urls?

+2  A: 

In order to test that a URL is 'correct or working', you'll need to actually try and interact with it (like a web browser would, for example).

I'd recommend an HTTP library for Perl like LWP::Simple to do so.

Brabster
So I should break it into 2 tasks then.
Berming
Absolutely. You are asking two completely different questions, for example: is http://google.com a valid HTTP URL? ...and... can I HTTP GET the resource defined by http://google.com over the network right now? Another example of how the questions differ - the answer to the first question will be the same over time, the answer to the second changes if your network goes down.
Brabster
+2  A: 

For validation http://www.php.net/manual/en/filter.filters.validate.php

For checking if it exists... well you need to try to access it actually.

Mchl
+10  A: 

Instead of cracking my head over a regex (URLs are very complicated), I just use filter_var(), and then attempt to ping the URL using cURL:

if (filter_var($url, FILTER_VALIDATE_URL) !== false)
{
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_exec($ch);
    $status_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($status_code >= 200 && $status_code < 400)
    {
        echo 'URL is valid!';
    }
}
BoltClock
If I'm not mistaken, pinging only tests whether the domain exists, not if the full URL is available.
Marcel Korpel
@Marcel Korpel: good point. Edited my answer to use cURL instead, which should be more viable.
BoltClock
I'd also add HEAD method there. Someone could point you to some 1GB sized file and your server will happily download it all otherwise. Besides, its not good to download something if you only want to check if it exists - thats what HEAD is for.
Daniel Kluev
@Daniel Kluev: good point as well, added the respective options now.
BoltClock
wrap it in a function, this way you get to brand it.
YuriKolovsky
@Daniel not all servers will give HEAD.
Gordon
@Gordon +1 for the phrasing, but I'm yet to see such a server. The RFC specifically says "This method is often used for testing hypertext links for validity..."
Artefacto
HTTP redirects do not necessarily mean the URLs are invalid
stillstanding
@stillstanding: I fixed my answer.
BoltClock
@Artefacto Agreed, just mentioning that it could potentially lead to false results when some admin disabled it.
Gordon
+2  A: 

RegExLib is good place to go for Reg Ex expressions

http://www.regexlib.com/Search.aspx?k=URL

Conrad Frix
+1  A: 

What I would do:

  1. Check that the URL is valid using a very open regex or filer_var with FILTER_VALIDATE_URL.
  2. Do an file_get_contents on the url and check that $http_response_header[0] contains a 200 HTTP resoponse.

Now, that's dirty, sure there is some more elegant version using curl and stuff.

nikic
you could just use [`get_headers`](http://www.php.net/manual/en/function.get-headers.php)
Gordon
Thanks, didn't know that function. PHP is full of surprises ;)
nikic
+1  A: 

There are a bunch of 'check that an external file exists' functions on the file_exists() manual page.

Tim Lytle
+1  A: 

i would use regex to go about solving this problem and i hate regex. This tool however makes my life so much easier... check it out >> http://gskinner.com/RegExr/

lando
+1  A: 

Pinging a URL to see if it is a valid URL is nonsense!

  • What if host is down?
  • What if the domain is not ping-able?

If you really want to do a "live" testing, better try to resolve the URL by using DSN. DNS is more reliable then PING or HTTP.

<?php
$ip = gethostbyname('www.example.com');

echo $ip;
?>

But even if this fails URL can be valid. It just have no DNS entry. So it depends on your needs.

resmo