tags:

views:

210

answers:

3

I need to disallow indexing 2 pages, one of them dynamic:
site.com/news.php
site.com/news.php?id=__
site.com/news-all.php

What should I write in robots.txt:

User-agent: *  
Disallow: /news

or

Disallow: /news*

or

Disallow: /news.php*  
Disallow: /news-all.php

Should one use wildcard in the end or not?

+2  A: 
User-agent: *
Disallow: /news.php?id=*

User-agent: *
Disallow: /news-all.php

More info here

EDIT:

The first rule will allow news.php with parameters but allow news.php without ?id=__. If you do not want to crawl news.php that you have to use /news.php*

Goran Jurić
+1  A: 

For sure

Disallow: /news.php
Disallow: /news-all.php

is correct. No stars are needed if you have the full filename. It is though interesting for me wheather the

Disallow: /news*

approach can work.

Teo
A: 

The Allow and Disallow lines in robots.txt say, "allow (or disallow) anything that starts with".

So:

Disallow: /news.php

is the same as

Disallow: /news.php*

Provided, of course, that the bot reading robots.txt understands wildcards. If the bot doesn't understand wildcards, then it will treat the asterisk as a part of the actual file name.

An asterisk at the end of the line is superfluous, and potentially hazardous.

Jim Mischel