tags:

views:

19

answers:

4

Is there any way in web development to ensure that web crawlers cannot crawl your website?

+1  A: 

You could place a robots.txt file with the following contents at the root of your site which will prevent the civilized robots from indexing it:

User-agent: *
Disallow: /

Notice that this won't prevent the uncivilized robots from indexing it. The only way to prevent them is using techniques such as Captcha.

Of course it is preferred to use a dedicated development machine which is not accessible from the internet while your site is under construction.

Darin Dimitrov
+3  A: 

Ensure? No.

You can ask politely with robots.txt (but they can be ignored), you can stick up barriers with CAPTCHA (but they can be defeated and impose a barrier to ordinary users), and you can monitor the behaviour of each visitor looking for bot patterns (but bots can proxy cycle and rate limit).

David Dorward
A: 

Use robots.txt to direct or allow/disallow robots from indexing your website.

Kangkan
Note that `robots.txt` can be ignored by crawlers, as David and Darin both mentioned.
Bart Kiers
+1  A: 

You could also deny access based on the crawlers user agent, of course this assumes that the crawler uses a user agent different from a regular browser.

Matthew Lock
"Bad" crawlers can always fake the user agent, so it is also just one of methods which can help, but mot prohibit them
Laimoncijus