views:

616

answers:

6

Help Help! Google indexed a test folder on my website which no one save I was supposed to know about :(! How do I restrict google from indexing links and certain folders.

+2  A: 

Use robots.txt.

Google for it, or check out: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360

Andrew Jaffe
+11  A: 

Use a robot exclusion file, or better yet password protect your test areas! Using a robots.txt file to "protect" areas you don't want others to see is a little like hanging a sign on your back door saying "I've left this open but please don't come in" :)

If you sign up for Google webmaster tools, you can request removal of a search result if you ensure it's no longer accessible by their crawler.

Paul Dixon
+1 for 'Using a robots.txt file to "protect" areas you don't want others to see is a little like hanging a sign on your back door saying "I've left this open but please don't come in" :)'
Unkwntech
+2  A: 

The best way to avoid crawlers to index some of your content is by the robots.txt file on the root of your site.

Here is an example:

User-agent: *
Allow: /
Crawl-delay: 5

User-agent: *
Disallow: /cgi-bin
Disallow: /css
Disallow: /img
Disallow: /js

On the first block I'm telling the crawler he can browse all.

The second block has the list of folders I want him to avoid.

This is not a safe way of really protect it, since some crawlers do not respect it.

If you really want to protect it, the best way should be to have a .htaccess file on those folders to force authentication.

Gustavo Carreno
You don't need an "allow" parameter. Everything but the disallowed folders and files will be indexed by default.
TFM
The issue with robots.txt isn't so much the crawlers that ignore it as the crawlers that treat it as a list of the most interesting things to examine and make a special point to go wherever you tell them not to.
Dave Sherohman
@Kent I just copy/pasted my own. Yes you don't need the allow for this purpose.
Gustavo Carreno
@Dave like someone said: "I've left this open, please don't come in" ;)
Gustavo Carreno
+2  A: 

Beware! You can tell "nice" bots (like google) to stay away from certain places, but other bots don't play that nice. So the only way to solve this properly is to add some restrictions to the places that are not considered "public". You could restrict access to some IP addresses you trust, or you could add username/password authentication.

thijs
+1  A: 

Maybe the right answer is to not put test code on a public web site. Why is it part of your deployment at all?

duffymo
Well the site is a small social networking site and the test site is just to test new modules to make sure they're working in the same online environment prior to synchronising with the actual website itself.
Ali
I'd say it ought to be done on another machine, not the production hardware.
duffymo
+2  A: