views:

390

answers:

3

Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders?

+10  A: 

Why?

Anyone doing evil (e.g., gathering email addresses to spam) will just ignore robots.txt. So you're only going to be blocking legitimate search engines, as robots.txt compliance is voluntary.

But — if you insist on doing it anyway — that's what the User-Agent: line in robots.txt is for.

User-agent: googlebot
Disallow: 

User-agent: *
Disallow: /

With lines for all the other search engines you'd like traffic from, of course. Robotstxt.org has a partial list.

derobert
+6  A: 

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: Slurp
Allow: /
User-Agent: msnbot
Disallow: 

Slurp is Yahoo's robot

NoahD
A: 

There is an API available at www.atlbl.com that can detect webcrawlers and let you block ones that don't honour robots.txt, or in your case only allow Google, Yahoo, and MSN.