views:

161

answers:

2

UPDATE: I've just realised that we are using Google Mini Search to crawl the website in order for us to support Google Search. This is bound to be creating an anonymous profile for not only each crawl but maybe even each page - would that be possible?

Hi all, some advice needed!

Our website receives approximately 50,000 hits a day, and we use anonymous ASP.Net membership profiles/users, this is resulting in millions (4.5m currently) of "active" profiles and the database is 'crawling', we have a nightly task that cleans up all the inactive ones.

There is no way that we have 4.5m unique visitors (our county population is only 1/2 million), could this be caused by crawlers and spiders?

Also, if we have to live with this huge number of profiles is there anyway of optimising the DB?

Thanks

Kev

+1  A: 

You could try deleting anonymous profiles in the Session_End event in your Global.asax.cs file.

There is every likelyhood that your site is being crawled, either by a legitimate search engine crawler and/or by an illegal crawler looking for vulnerabilities that would allow hackers to take control of your site/server. You should look into this, regardless of which solution you take for removing old profiles.

If you are using the default Profile Provider, which keeps all of the profile information in a single column, you might want to read this link which is to Scott Guthrie's article on a better performing table-based profile provider.

Daniel Dyson
+1  A: 

Update following conversation:

Might I suggest that you implement a filter that can identify crawlers via request headers, and logging the anon cookie which you can later that same day. decrypt and delete the anon aspnet_profile and aspnet_users record with the associated UserId.

You might be fighting a losing battle but at least you will get a clear idea of where all the traffic is coming from.


AnonymousId cookies and, by proxy, anonymous profiles are valid for 90 days after last use. This can result in the anon profiles piling up.

A very simple way to handle this is to use ProfileManager.

ProfileManager.DeleteInactiveProfiles(ProfileAuthenticationOption.Anonymous, DateTime.Now.AddDays(-7));

will clear out all the anonymous profiles that have not been accessed in the last 7 days.

But that leaves you with the anonymous records in aspnet_Users. Membership does not expose a method similar to ProfileManager for deleting stale anonymous users.

So...

The best bet is a raw sql attack, deleting from aspnet_Profile where you consider them stale, and then run the same query on aspnet_User where IsAnonymous = 1.

Good luck with that. Once you get it cleaned up, just stay on top of it.


Updated Update:

The code below is only valid on IIS7 AND if you channel all requests through ASP.Net

You could implement a module that watches for requests to robots.txt and get the anonymous id cookie and stash it in a robots table which you can use to safely purge your membership/profile tables of robot meta every night. This might help.

Example:

using System;
using System.Diagnostics;
using System.Web;

namespace NoDomoArigatoMisterRoboto
{
    public class RobotLoggerModule : IHttpModule
    {
        #region IHttpModule Members

        public void Init(HttpApplication context)
        {
            context.PreSendRequestHeaders += PreSendRequestHeaders;
        }

        public void Dispose()
        {
            //noop
        }

        #endregion

        private static void PreSendRequestHeaders(object sender, EventArgs e)
        {
            HttpRequest request = ((HttpApplication)sender).Request;



            bool isRobot = 
                request.Url.GetLeftPart(UriPartial.Path).EndsWith("robots.txt", StringComparison.InvariantCultureIgnoreCase);

            string anonymousId = request.AnonymousID;

            if (anonymousId != null && isRobot)
            {
                // log this id for pruning later
                Trace.WriteLine(string.Format("{0} is a robot.", anonymousId));
            }
        }
    }
}

Reference: http://www.codeproject.com/Articles/39026/Exploring-Web-config-system-web-httpModules.aspx


Sky Sanders
I am clearing them up, but I'm using the default Inactive time, which I think is around 60 days, I can quite easily change that to 7 but the website manager would rather they stayed for as long as possible because it contains customisations to the home page.So even clearing up 60 day-old profiles is retaining 4.5 million...
Mantorok
@Mantorok- you are keeping anonymous customization for users that have not visited your site for 2 months? that sounds like retention of the anal kind. Would you even remember what aesthetic changes you made to a site you visited, anonymously, 2 months ago? just sayin.... ;-)
Sky Sanders
@code, no I completely agree with you, I wanted it to be a week or so, but I had to take orders. I may have to have another little 'chat' with our web manager :-)
Mantorok
@code, that's an interesting update. Do you have any details on what I should be looking for in the headers? Thanks.
Mantorok
@code, thanks for that, I will look into that now. See my last update - Doh!
Mantorok