one robots.txt to allow crawling only live website rest of them should be disallowed. | ansaurus

tags:

robots.txt

views:

22

answers:

1

Q:

one robots.txt to allow crawling only live website rest of them should be disallowed.

I need guideline about using of robots.txt problem is as following.

I have one live website "www.faisal.com" or "faisal.com" and have two testing web servers as follows

"faisal.jupiter.com" and "faisal.dev.com"

I want one robots.txt to handle this all, i don't want crawlers to index pages from "faisal.jupiter.com" and "faisal.dev.com" only allowed to index pages from "www.faisal.com" or "faisal.com"

I want one robots.txt file which will be on all web servers and and should allow indexing only live website.

+1 A:

The disallow commands specifies only relative URL so I guess you cannot have the same robots.txt file for all.

Why not force HTTP authentification on the dev/test servers ?

That way the robots wont be able to crawl these servers.

Seems like a good idea if you want to allow specific people to check them but not everybody trying to find flaws in your not yet debugged new version ...

Especially now that you gave the adresses to everybody on the web.

siukurnin 2010-09-30 07:31:04

related questions

blocked links in sitemap

How to allow crawlers access to index.php only, using robots.txt ?

How do you dynamically edit robots.txt in a load balanced environment?

How do I modify robots.txt in Plone?

How to disallow search pages from robots.txt

robots.txt and wildcard at the end od disallow

how to disallow all dynamic urls robots.txt

Google Sitemap and Robots.txt Issue

Google's robots.txt: Is scraping your positions = ignoring it?

How can I prevent the googlebot from crawling Ajaxified Links?

Restrict robot access for (specific) query string (parameter) values?

Is the sitemap.axd accepted by all search engines?

Ethics of Robots.txt

Google indexed my test folders on my website :( How do I restrict the web crawlers!

Robots.txt block access to all https:// pages

will googlebot index my site?

robots.txt: Disallow bots to access a given "url depth"

Anybody got any C# code to parse robots.txt and evaluate URLS against it

How to prevent robots.txt passing from staging env to production?

robots.txt: disallow all but a select few, why not?

Googlebot not respecting Robots.txt

Robots.txt to disallow everything and allow only specific parts of the site/pages. Is "allow" supported by crawlers like Ultraseek and FAST?

Possible to prevent search engine spiders from infinitely crawling paging links on search results?

Googlebots Ignoring robots.txt?

How to set up a robot.txt which only allows the default page of a site