This is my code:
<?php
$url = "http://www.uhasselt.be/collegeroosters/2009_2010_298_5_10.html";
$headers = get_headers($url, 1);
print_r($headers);
$contloc = $headers["Content-Location"];
echo "Content-Location: " . $contloc . "\n";
$soft404test = strpos($contloc, "http://www.uhasselt.be/404b.htm") ? true : false;
var_dump($soft404test);
?>
This is its output:
Array
(
[0] => HTTP/1.1 200 OK
[Content-Length] => 2030
[Content-Type] => text/html
[Content-Location] => http://www.uhasselt.be/404b.htm?404;http://www.uhasselt.be:80/collegeroosters/2009_2010_298_5_10.html
[Last-Modified] => Mon, 22 Aug 2005 07:10:22 GMT
[Accept-Ranges] => bytes
[ETag] => "88a8b68fe8a6c51:31c9e"
[Server] => Microsoft-IIS/6.0
[MicrosoftOfficeWebServer] => 5.0_Pub
[X-Powered-By] => ASP.NET
[Date] => Tue, 24 Nov 2009 08:40:25 GMT
[Connection] => close
)
Content-Location: http://www.uhasselt.be/404b.htm?404;http://www.uhasselt.be:80/collegeroosters/2009_2010_298_5_10.html
bool(false)
This behavior is unexpected. What I thought I was doing is detecting soft 404's by looking at the Content-Location attribute in my HTTP headers. The strpos function makes decisions I don't get. Where did I go wrong? (I don't need this to work on other sites, by the way.)