Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders?
+10
A:
Why?
Anyone doing evil (e.g., gathering email addresses to spam) will just ignore robots.txt. So you're only going to be blocking legitimate search engines, as robots.txt compliance is voluntary.
But — if you insist on doing it anyway — that's what the User-Agent:
line in robots.txt is for.
User-agent: googlebot
Disallow:
User-agent: *
Disallow: /
With lines for all the other search engines you'd like traffic from, of course. Robotstxt.org has a partial list.
derobert
2009-03-22 19:35:01
A:
There is an API available at www.atlbl.com that can detect webcrawlers and let you block ones that don't honour robots.txt, or in your case only allow Google, Yahoo, and MSN.