views:

281

answers:

3

Hi,

in my robots.txt file, I have the following line

User-agent: Googlebot-Mobile Disallow: /

User-agent:GoogleBot Disallow: /

Sitemap: http://mydomain.com/sitemapindex.xml

I know that if I put the first 4 lines , googlebot won't index the sites, but what if I put the last line Sitemap: http://mydomain.com/sitemapindex.xml, will googlebot be able to index the site?

Thanks,

+2  A: 

No I dont think Google will do that. Its actually a question of Good bot and Bad bot. Even if you add a robots.txt file to restrict some area Bots can still crawl. Its actually a question of Yes or No. robots.txt is just like a warning board and not a security wall.

Shoban
Agreed with security concerns, OP might not know bad bot scenarios if he wanted to keep his site "off the map" completely.
Nullw0rm
+4  A: 

I tested your robots.txt against my own domain (which has a sitemap entry for every page) and Googlebot and Googlebot-Mobile returned that they were Disallowed access.

Based on this - I would say the robots.txt file takes precedence over any sitemaps.

Plus, logically speaking - if you block the entire domain, the bot is disallowed access to the sitemap. The sitemap entry just tells crawlers where to find your sitemap - not their authorization to access it.

Even if you allowed the sitemap, I don't think bots would crawl your site - sitemaps are designed more for telling the bot how often to crawl your site, not what they are allowed to crawl.

Michael Wales
+1  A: 

googlebot will not even be able to touch the sitemapindex.xml

  • the robots.txt is a crawler directive.
  • the sitemap.xml is fetched via the googlebot crawler.
  • googlebot will not access the sitemapindex.xml
  • no crawl coverage, no indexing, no SERP listing

you can test this with google webmaster tools robots.txt verification tool and fetch as googlebot (in the labs section) feature.

Franz
You should correct the typos alittle, but +1 for your clarity on the terms.
Nullw0rm