You can probably create a filter attribute that rejects the request using the User Agent header. The usefulness of this is questionable(and is not a security feature) as the header can be easily faked, but it will stop people doing it in a stock browser.
This page contains a list of user agent strings that googlebot uses.
Sample code to redirect non-googlebots to a 404 action on an error controller:
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public class BotRestrictAttribute : ActionFilterAttribute {
public override void OnActionExecuting(ActionExecutingContext c) {
if (c.RequestContext.HttpContext.Request.UserAgent != "Googlebot/2.1 (+http://www.googlebot.com/bot.html)") {
c.Result = RedirectToRouteResult("error", new System.Web.Routing.RouteValueDictionary(new {action = "NotFound", controller = "Error"}));
}
}
}
EDIT To respond to comments. If server load is an issue for your sitemap, restricting access to the bots might not be sufficient. Googlebot by itself has the ability to grind your server to a halt if it decides to scrape aggressively. You should probably cache the response as well. You can use the same FilterAttribute
and Application.Cache
for that.
Here is a very rough example, might need tweaking with propert HTTP headers:
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public class BotRestrictAttribute : ActionFilterAttribute {
public const string SitemapKey = "sitemap";
public override void OnActionExecuting(ActionExecutingContext c) {
if (c.RequestContext.HttpContext.Request.UserAgent != "Googlebot/2.1 (+http://www.googlebot.com/bot.html)") {
c.Result = RedirectToRouteResult("error", new System.Web.Routing.RouteValueDictionary(new {action = "NotFound", controller = "Error"}));
return;
}
var sitemap = Application.Cache[SitemapKey];
if (sitemap != null) {
c.Result = new ContentResult { Content = sitemap};
c.HttpContext.Response.ContentType = "application/xml";
}
}
}
//In the sitemap action method
string sitemapString = GetSitemap();
HttpContext.Current.Cache.Add(
BotRestrictAttribute.SitemapKey, //cache key
sitemapString, //data
null, //No dependencies
DateTime.Now.AddMinutes(1),
Cache.NoSlidingExpiration,
CacheItemPriority.Low,
null //no callback
);