Hello.
I am building a basic link checker at work using cURL. My application has a function called getHeaders() that returns an array of HTTP headers:
function getHeaders($url) {
if(function_exists('curl_init')) {
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
$options = array(
CURLOPT_URL => $url,
CURLOPT_HEADER => true,
CURLOPT_NOBODY => true,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_RETURNTRANSFER => true );
curl_setopt_array($ch, $options);
// grab URL and pass it to the browser
curl_exec($ch);
$headers = curl_getinfo($ch);
// close cURL resource, and free up system resources
curl_close($ch);
} else {
echo "Error: cURL is not installed on the web server. Unable to continue.
";
return false;
}
return $headers;
}
print_r(getHeaders('mail.google.com'));
Which yields the following results:
Array
(
[url] => http://mail.google.com
[content_type] => text/html; charset=UTF-8
[http_code] => 404
[header_size] => 338
[request_size] => 55
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.128
[namelookup_time] => 0.042
[connect_time] => 0.095
[pretransfer_time] => 0.097
[size_upload] => 0
[size_download] => 0
[speed_download] => 0
[speed_upload] => 0
[download_content_length] => 0
[upload_content_length] => 0
[starttransfer_time] => 0.128
[redirect_time] => 0
)
I've tested it with several long links, and the function acknowledges redirects, all apart from mail.google.com it seems.
For fun, I passed the same URL (mail.google.com) to the W3C link checker, which produced:
Results Links Valid links! List of redirects The links below are not broken, but the document does not use the exact URL, and the links were redirected. It may be a good idea to link to the final location, for the sake of speed. warning Line: 1 http://mail.google.com/mail/ redirected to https://www.google.com/accounts/ServiceLogin?service=mail&passive=true&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3Dl&bsv=zpwhtygjntrz&scc=1<mpl=default<mplcache=2 Status: 302 -> 200 OK This is a temporary redirect. Update the link if you believe it makes sense, or leave it as is. Anchors Found 0 anchors. Checked 1 document in 4.50 seconds.
Which is correct, as the address above is where I am redirected to when I enter mail.google.com into my browser.
What cURL options would I need to use to make my function return 200 for mail.google.com?
Why is it that the function above returns 404 status code as opposed to 302 status code?
TIA