I've read the Google docs on sitemap formats but there's one thing they don't make clear: will search engines automatically look for and find /sitemap_index.xml or do you have to tell them about it via /robots.txt or the main /sitemap.xml? Can you not have a /sitemap.xml and still rely on /sitemap_index.xml to be found and harvested?
The best way is to point to your sitemap in robots.txt:
Sitemap: <sitemap_location>
There is a pretty good explanation about this at www.sitemaps.org
In your robots.txt you can point to a sitemap or a sitemap index file using the same syntax:
Sitemap: <location>
Search engines will know by looking at the file what kind it is.
It is worth noting also that sitemap.xml and sitemap_index.xml are only suggested filenames, you may use any names you wish. Unlike the known location of robots.txt, search engines will not check any location such as sitemap.xml or sitemap_index.xml without you telling it that there is something at that location.
I´ve read a lot of discussions about this issue and no one knows how Google handle this.
Some people say that the crawlers will look for these files by default:
- /sitemap.xml
- /sitemap.xml.gz
- /sitemap.gz
If you read the articles from Google about the xml-sitemaps they often talk about these filenames, is it a coincidence?
Maybe it´s true, but, I would suggest to follow the Google guidelines and use robots.txt + submit the sitemap through Google Webmaster Tools.
I think that the webmaster tools is underestimated, it´s gold worth, you´re able to receive information directly from Google, information that helps you to improve your website.
If you dont want to log into webmaster tools everytime to resubmit the sitemap when it´s updated, you can ping Google to let them know about the changes.
The link to ping Google: www.google.com/webmasters/tools/ping?sitemap=sitemap_url
More info at Google Support.
still seems all of this is hard to take in. i try to read all i can but it just doesnt help some times.