Website Spidering Auto Detection

tags:

spider

views:

110

answers:

+1 Q:

Website Spidering Auto Detection

is it possible to write code to detect if a website are spidering the content?

+3 A:

a good spider

reads the robots.txt
has a proper user-agent
will query faster than an average user

But a clear detection if it's a browser or a spider is not possible i think.

Sebastian Sedlak 2009-04-08 09:13:55

+1 A:

You try using the user agent string to identify the bots.

Different bots seem to have different user agent strings:

http://www.useragentstring.com/pages/useragentstring.php

However, the user agent string can be easily spoof.

maxyfc 2009-04-08 09:15:28

+1 A:

You can use a list of User-Agent strings that the common bots use. You can use some form of rate-detection and determine that a very high rate of requests will probably be a spider (or someone leeching your entire site).

There might also be lists of IP adresses used by common bots, but a fool-proof detection system is most-likely impossible.

You could create a link on your pages that a real visitor would never click and flag anyone that does follow the link as a spider. You will get some people clicking the link anyway but curiosity cannot be avoided.

Gerco Dries 2009-04-08 09:15:59

+1 A:

If the spider is nice, you can detect it through it's user-agent using a list of existing user agents like this. But a nice webspider usually also follows robots.txt convention

Robots that ignore the robots.txt file and spoof their user-agent most likely also use other means to hide that they are a spider.

Caotic 2009-04-08 09:17:11

ansaurus

tags:

views:

answers:

Website Spidering Auto Detection

related questions