Update following conversation:
Might I suggest that you implement a filter that can identify crawlers via request headers, and logging the anon cookie which you can later that same day. decrypt and delete the anon aspnet_profile and aspnet_users record with the associated UserId.
You might be fighting a losing battle but at least you will get a clear idea of where all the traffic is coming from.
AnonymousId cookies and, by proxy, anonymous profiles are valid for 90 days after last use. This can result in the anon profiles piling up.
A very simple way to handle this is to use ProfileManager
.
ProfileManager.DeleteInactiveProfiles(ProfileAuthenticationOption.Anonymous, DateTime.Now.AddDays(-7));
will clear out all the anonymous profiles that have not been accessed in the last 7 days.
But that leaves you with the anonymous records in aspnet_Users. Membership
does not expose a method similar to ProfileManager
for deleting stale anonymous users.
So...
The best bet is a raw sql attack, deleting from aspnet_Profile where you consider them stale, and then run the same query on aspnet_User where IsAnonymous = 1
.
Good luck with that. Once you get it cleaned up, just stay on top of it.
Updated Update:
The code below is only valid on IIS7 AND if you channel all requests through ASP.Net
You could implement a module that watches for requests to robots.txt
and get the anonymous id cookie and stash it in a robots table which you can use to safely purge your membership/profile tables of robot meta every night. This might help.
Example:
using System;
using System.Diagnostics;
using System.Web;
namespace NoDomoArigatoMisterRoboto
{
public class RobotLoggerModule : IHttpModule
{
#region IHttpModule Members
public void Init(HttpApplication context)
{
context.PreSendRequestHeaders += PreSendRequestHeaders;
}
public void Dispose()
{
//noop
}
#endregion
private static void PreSendRequestHeaders(object sender, EventArgs e)
{
HttpRequest request = ((HttpApplication)sender).Request;
bool isRobot =
request.Url.GetLeftPart(UriPartial.Path).EndsWith("robots.txt", StringComparison.InvariantCultureIgnoreCase);
string anonymousId = request.AnonymousID;
if (anonymousId != null && isRobot)
{
// log this id for pruning later
Trace.WriteLine(string.Format("{0} is a robot.", anonymousId));
}
}
}
}
Reference: http://www.codeproject.com/Articles/39026/Exploring-Web-config-system-web-httpModules.aspx