I have some geo targeting code whcih I want to behave in a particular way if the site is being spidered by a robot e.g. google etc.
Is there any way to infer this?
I have some geo targeting code whcih I want to behave in a particular way if the site is being spidered by a robot e.g. google etc.
Is there any way to infer this?
Presenting different content to search engine crawlers and human visitors - called cloaking - is a risky thing, and can be punished by the search engine if detected.
That said, check out this SO answer with several links to well-maintained "bot lists". You would have to parse the USER_AGENT string and compare it against such a bot list.
You can check this by the user-agent property. For more info on user agent strings, check here: http://www.user-agents.org/ Mark the records with type "R = Robot, crawler, spider ". Bit this is not guaranteed, the user-agent property might be changes by several factors and this is not 100% reliable.
You can do it by checking for the user-agent, or the IP. It may be preferable to use the latter as it's not unknown for other, less reputable bots, to spoof the user-agent of the big guys. Even for google et al their IPs tend to be in narrow ranges, so detecting on IP shouldn't require compiling of vast lists.
If you are only interested in the well set up reputable bots e.g. Google, Yahoo, MSN/Live/Bing/whatever-it-is-today, Ask etc then you can use round trip DNS checking.
1) Check for known user agent (look for known substring such as googlebot)
e.g. Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html
2) Do a reverse DNS for the requesting IP and check that it comes from a reasonable domain.
e.g. rdns of 66.249.71.202 is crawl-66-249-71-202.googlebot.com (so happy that it comes from googlebot.com)
3) On it's own step 2 can be faked, so now check the dns of the A record for the result returned in step 2 and ensure you have the original requesting IP.
e.g. dns for above is
crawl-66-249-71-202.googlebot.com. A 66.249.71.202
66.249.71.202 was the requesting IP address so this is a valid googlebot.