views:

72

answers:

2

I'm going to block all bots except the big search engines. One of my blocking methods will be to check for "language": Accept-Language: If it has no Accept-Language the bot's IP address will be blocked until 2037. Googlebot does not have Accept-Language, I want to verify it with DNS lookup

<?php
gethostbyaddr($_SERVER['REMOTE_ADDR']);
?>

Is it ok to use gethostbyaddr, can someone pass my "gethostbyaddr protection"?

+1  A: 
//The function
function is_google() {
    return strpos($_SERVER['HTTP_USER_AGENT'],"Googlebot");
}
Cristian
"Googlebot" doesn't mean that it is the real Googlebot.
ilhan
Of course not, but it's not a big deal after all... what can do a user that fakes the user agent? Maybe create a Google clone, yeah, that would be a nice project.
Cristian
No big deal. All they can do is crawl your entire site, regurgitate it with better SEO than yours (since they've honed how to rank w/o having to worry about details like quality content), then they use their link farm w/ high PR to compete w/ you in Google ranking, on your own site content.
joedevon
+5  A: 

How to verify Googlebot.

Marcel Korpel
In fact, that's a better method that mine. That's what I love from SO... you will learn something everyday. Thanks!
Cristian
@Christian – To be frank, I think yours is good enough. The price of a false-positive is very low, I think. I'm more worried about false-negatives in this case: ordinary people with a UA that somehow doesn't send an `Accept-Language` header (don't ask me which; a quick test revealed that curl doesn't send one).
Marcel Korpel