tags:

views:

260

answers:

1

Hi,

I'm having trouble using curl to retrieve headers for a minority of sites.

Some examples are digg.com and microsoft.com.

function get_headers_curl($url, $port)
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL,            $url);
    curl_setopt($ch, CURLOPT_HEADER,         true);
    curl_setopt($ch, CURLOPT_NOBODY,         true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_PORT,           $port);
    curl_setopt($ch, CURLOPT_TIMEOUT,        10);

    $r = curl_exec($ch);
    $r = split("\n", $r);
    return $r;
}

That is the function and options I am currently using, and for ease of use I have a little test script running @ http://isitup.org/test.php?d=example.com. It just returns the headers of the response, and with the example sites the lack of one.

The problem is these sites seem to ignore the request and I get no response. I've had a play around with diffrent options but cannot seem to get a response.

Is there something I'm missing? Or is it just not possiable to access such sites using curl?

Regards,

Sam

Edit:

test.php is the following:

<?php

$domain = preg_replace("/[^A-Za-z0-9-\/\.\:]/", "", trim($_GET["d"]));

$agent = "Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:1.9.0.2) Gecko/20121223 Ubuntu/9.25 (jaunty) Firefox/3.8";

function get_headers_curl($url, $port)
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL,            $url);
//  curl_setopt($ch, CURLOPT_HEADER,         true);
//  curl_setopt($ch, CURLOPT_NOBODY,         true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_PORT,    $port);
    curl_setopt($ch, CURLOPT_TIMEOUT,        10);
    curl_setopt($ch, CURLOPT_USERAGENT,      $agent);


    $r = curl_exec($ch);
    $r = split("\n", $r);
    return $r;
}

$headers = get_headers_curl("http://".$domain, 80);

print("<pre>".print_r($headers,true)."</pre>");


?>

However the new user agent still does not get a response from these sites...

Update: Woops seen my error, shifted agent into the function and yea it works! Thanks :P

+1  A: 

The both work fine for me when I add a user agent string with CURLOPT_USERAGENT.

// e.g.
$agent = 'Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:1.9.0.2) Gecko/20121223 Ubuntu/9.25 (jaunty) Firefox/3.8';
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
GZipp
Could you tell me all the curl options that you were using, maybe I am doing something wrong with my current options.Also thanks for the split() info, didn't know about that. I'll use explode() instead.
Sam
I added those two lines to your function, which I see you've now done.
GZipp