tags:

views:

13

answers:

2

I have a situation where I want to disallow the crawling of certain pages within a directory. This directory contains a large number of files but there are a few files that I need to still be indexed. I will have a very large robots file if I need to go through disallowing each page individually. Is there a way to disallow a folder in robots.txt except for certain files?

A: 

There is a non-standard extension to the robots.txt format for specifying "Allow" rules. Not every bot honors it, and some bots process them differently than others.

You can read more about it in this Wikipedia article: http://en.wikipedia.org/wiki/Robots_exclusion_standard#Allow_directive

kbrimington
A: 

To get that sort of fine grained control, you might be better off using a robots meta tag in your HTML. That is assuming the files in questions are all HTML.

<meta name="robots" content="noindex" />

This should be placed in the head of your document.

I find these tags easier to maintain than robots.txt as well.

Richard