Hi,
I'm having trouble using curl to retrieve headers for a minority of sites.
Some examples are digg.com and microsoft.com.
function get_headers_curl($url, $port)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_PORT, $port);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$r = curl_exec($ch);
$r = split("\n", $r);
return $r;
}
That is the function and options I am currently using, and for ease of use I have a little test script running @ http://isitup.org/test.php?d=example.com. It just returns the headers of the response, and with the example sites the lack of one.
The problem is these sites seem to ignore the request and I get no response. I've had a play around with diffrent options but cannot seem to get a response.
Is there something I'm missing? Or is it just not possiable to access such sites using curl?
Regards,
Sam
Edit:
test.php is the following:
<?php
$domain = preg_replace("/[^A-Za-z0-9-\/\.\:]/", "", trim($_GET["d"]));
$agent = "Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:1.9.0.2) Gecko/20121223 Ubuntu/9.25 (jaunty) Firefox/3.8";
function get_headers_curl($url, $port)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
// curl_setopt($ch, CURLOPT_HEADER, true);
// curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_PORT, $port);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
$r = curl_exec($ch);
$r = split("\n", $r);
return $r;
}
$headers = get_headers_curl("http://".$domain, 80);
print("<pre>".print_r($headers,true)."</pre>");
?>
However the new user agent still does not get a response from these sites...
Update: Woops seen my error, shifted agent into the function and yea it works! Thanks :P