views:

35

answers:

2

I am implementing a simplistic filter on how much of my site unregistered users can acces. Naturally, I want to give SEO bots free reign/access to most of the site.

I know this is simplistic. But its not worth doing anything more complicated. I need to compile a list of names of user agents I will allow, for this, I need a list of the names of the bots, starting with Googlebot (I don't even know if that is the official spelling of Gogles web crawling bot).

Anyone I would like a link to a definite resource that gives the names of the SEO indexing bots?. I tried http://www.user-agents.org/ but the granularity is not fine enough - it appears to list every user agent ever created!

+2  A: 

Try this list: http://www.useragentstring.com/pages/Crawlerlist/

Although the combination of Google, Yahoo, Bing, Baidu, Ask, and AOL represent virtually 100% of the search engine market. So I would recommend adding the crawler user agents for just those to your filter and you really don't need to worry about the rest.

jwanagel
jwanagel: +1 for the link. Just what I was looking for..
morpheous
@jwanagel: I cant locate the username for the AOL bot - do you know what it is?
morpheous
AOL is powered by Google, they don't have their own crawler.
jwanagel
A: 

You should also remember to avoid seeing your content through search engine's cache then:

<meta name=”robots” content=”noarchive”>

kraabus
@kraabus: could you explain what the effect of this meta tag would be ?
morpheous
if you search in Google, there's a link under each result that lets you see the page's snapshot, cached by Google: http://www.googleguide.com/cached_pages.htmlNow thru this link, everybody is able to see your content. The meta tag tells search engine not to cache your page:http://www.marketingtitan.com/google_cache
kraabus