views:

387

answers:

1

I have links with this structure:

  • http://www.example.com/tags/blah
  • http://www.example.com/tags/blubb
  • http://www.example.com/tags/blah/blubb (for all items that match BOTH tags)

I want google & co to spider all links that have ONE tag in the URL, but NOT the URLs that have two or more tags.

Currently I use the html meta tag "robots" -> "noindex, nofollow" to solve the problem.

Is there a robots.txt solution (that works at least for some search bots) or do I need to continue with "noindex, nofollow" and live with the additional traffic?

+1  A: 

I don't think you can do it using robots.txt. The standard is pretty narrow (no wildcards, must be at the top level, etc.).

What about disallowing them based on user-agent in your server?

MarkusQ
Disallowing access would result in some kind of HTTP error. Not sure how google reacts to pages with lots of "server error". Not very enthusiastic to try out :)
BlaM
Wouldn't have to--you could serve up some cheap static "nothing to see here" page.
MarkusQ
That's true. Would at least be better than to serve the full page.
BlaM