+3  Q: 

Dynamic robots.txt

Let's say I have a web site for hosting community generated content that targets a very specific set of users. Now, let's say in the interest of fostering a better community I have an off-topic area where community members can post or talk about anything they want, regardless of the site's main theme.

Now, I want most of the content to get indexed by Google. The notable exception is the off-topic content. Each thread has it's own page, but all the threads are listed in the same folder so I can just exclude search engines from a folder somewhere. It has to be per-page. A traditional robots.txt file would get huge, so how else could I accomplish this?

+1  A: 

If using Apache I'd use mod-rewrite to alias robots.txt to a script that could dynamically generate the necessary content.

Edit: If using IIS you could use ISAPIrewrite to do the same.

James Marshall
+12  A: 

< META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> This will work for all well-behaving search engines, just add it to the head.


Simlarly to @James Marshall's suggestion - in ASP.NET you could use an HttpHandler to redirect calls to robots.txt to a script which generated the content.

Ian Nelson