Question 1

We would like to generate a site map for our CMS site

We have multiple front end servers with approx a million articles.


  • multiple MS SQL servers
  • multiple front end servers (load balanced)
  • - and IIS 6
  • Windows 2003

To have the site maps (the site map index file, and the site map files) physically on the front end servers will be a operations nightmare and error prone.

So we are considering using http handlers instead so that it does not matter what server gets the request, it will be able to serve the correct xml file.

Question 2

If we ping Google each time we publish a new article will that effect us negatively if that happens more than once a hour.


+2  A: 

I would generate the XML when something changes rather than on the demand of Google. This means it only updates as fast as it needs to.

I would store this centrally. If you have a CDN, whack it up on there and redirect to it. You might argue this is a nightmare but is it any worse than having all your frontends generating their own version of the sitemap? Answer: No, it's a lot more efficient.

If you don't have a CDN, I would investigate a method for redirecting a request from one node to another. You might even be able to control this with your load-balancer so certain user-agent strings go straight to your sitemap-generating node.

As for Question 2... Google claim they will automatically check with you as soon as they know how fast your site updates. I would update manually for the first few times and then Google should know what speed to keep going at.

But as long as you're not sending 20 pings an hour, I doubt they'll mind too much. It's not as if it changes your SERP performance.

I really like the user-agent / load balancer approach - that sounds very do-able. Going to chat to our ops team and see what they think. Thanks!
Rihan Meij