Hello.
I am building a basic link checker at work using cURL. My application has a function called getHeaders() that returns an array of HTTP headers:
function getHeaders($url) { if(function_exists('curl_init')) { // create a new cURL resource $ch = curl_init(); // set URL and other appropriate options $options = array( CURLOPT_URL => $url, CURLOPT_HEADER => true, CURLOPT_NOBODY => true, CURLOPT_FOLLOWLOCATION => 1, CURLOPT_RETURNTRANSFER => true ); curl_setopt_array($ch, $options); // grab URL and pass it to the browser curl_exec($ch); $headers = curl_getinfo($ch); // close cURL resource, and free up system resources curl_close($ch); } else { echo "Error: cURL is not installed on the web server. Unable to continue.
"; return false; } return $headers; } print_r(getHeaders('mail.google.com'));
Which yields the following results:
Array ( [url] => http://mail.google.com [content_type] => text/html; charset=UTF-8 [http_code] => 404 [header_size] => 338 [request_size] => 55 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 0.128 [namelookup_time] => 0.042 [connect_time] => 0.095 [pretransfer_time] => 0.097 [size_upload] => 0 [size_download] => 0 [speed_download] => 0 [speed_upload] => 0 [download_content_length] => 0 [upload_content_length] => 0 [starttransfer_time] => 0.128 [redirect_time] => 0 )
I've tested it with several long links, and the function acknowledges redirects, all apart from mail.google.com it seems.
For fun, I passed the same URL (mail.google.com) to the W3C link checker, which produced:
Results Links Valid links! List of redirects The links below are not broken, but the document does not use the exact URL, and the links were redirected. It may be a good idea to link to the final location, for the sake of speed. warning Line: 1 http://mail.google.com/mail/ redirected to https://www.google.com/accounts/ServiceLogin?service=mail&passive=true&rm=false&continue=http%3A%2F%2Fmail.google.com%2Fmail%2F%3Fui%3Dhtml%26zy%3Dl&bsv=zpwhtygjntrz&scc=1<mpl=default<mplcache=2 Status: 302 -> 200 OK This is a temporary redirect. Update the link if you believe it makes sense, or leave it as is. Anchors Found 0 anchors. Checked 1 document in 4.50 seconds.
Which is correct, as the address above is where I am redirected to when I enter mail.google.com into my browser.
What cURL options would I need to use to make my function return 200 for mail.google.com?
Why is it that the function above returns 404 status code as opposed to 302 status code?
TIA