views:

593

answers:

4

Hiya, I'm trying to write a sitemap.php which acts differently depending on who is looking.

I want to redirect crawlers to my sitemap.xml, as that will be the most updated page and will contain all the info they need, but I want my regular readers to be show a html sitemap on the php page.

This will all be controlled from within the php header, and I've found this code on the web which by the looks of it should work, but it's not. Can anyone help crack this for me?

function getIsCrawler($userAgent) {
    $crawlers = 'firefox|Google|msnbot|Rambler|Yahoo|AbachoBOT|accoona|' .
    'AcioRobot|ASPSeek|CocoCrawler|Dumbot|FAST-WebCrawler|' .
    'GeonaBot|Gigabot|Lycos|MSRBOT|Scooter|AltaVista|IDBot|eStyle|Scrubby';
    $isCrawler = (preg_match("/$crawlers/i", $userAgent) > 0);
    return $isCrawler;
}

$crawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);

if ($isCrawler) {
    header('Location: http://www.website.com/sitemap.xml');
    } else {
    echo "not crawler!";
}

It looks pretty simple, but as you can see i've added firefox into the agent list, and sure enough I'm not being redirected..

Thanks for any help :)

+8  A: 

You have a mistake in your code:

$crawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);

should be

$isCrawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);

If you develop with notices on you'll catch these errors much more easily.

Also, you probable want to exit after the header

Greg
+1 for the notice suggestion
Eineki
+1 for the Eineki...
Ahmet Kakıcı
Doh! can't believe I missed that. Good suggestion on notices too, shall do that.
hfidgen
+1  A: 

http://develobert.blogspot.com/2008/11/php-robot-check.html

joebert
Nice one - that's a neat way of doing it. Unfortunately this site is being developed on IIS not apache - so no htaccess or anything like it which I can use :x
hfidgen
A: 

since when "firefox" is a crawler?

Mihai Secasiu
yeah I know it's a question not an answer but I was pointing out a problem with that function. Why did I get -1 ?
Mihai Secasiu
dunno who -1'd you, but if you read my post you'll see I added firefox as a testing tool :)
hfidgen
I think you should have added a comment to the question.
Cristian Ciupitu
A: 

The is a webcrawler identification API available at www.atlbl.com that I have used before to customise content for Google at al, and block crawlers that don't honour my robots.txt