views:

20

answers:

1

I would like to deny web robots to access a url like this:

http://www.example.com/export

allowing this kind of url instead:

http://www.example.com/export?foo=value1

A spider bot is calling /export without query string causing a lot of errors on my log.
Is there a way to manage this filter on robots.txt?

+1  A: 

I am assuming you have problems with bots hitting the first URL in your example.

As said in the comment, this is probably not possible, because http://www.example.com/export is the resource's base URL. Even if it were possible as per the standard, I wouldn't trust bots to understand this properly.

I would also not send a 401 Access denied or similar header if the URL is called without a query string for the same reason: A bot could think that the resource is out of bounds entirely.

What I would do in your situation is, if somebody arrives at

 http://www.example.com/export

send a 301 Moved permanently redirect to the same URL and a query string with some default values, like

 http://www.example.com/export?foo=0

this should keep the search engine index clean. (It won't fix the logging problem you state in your comment, though.)

Pekka