tags:

views:

392

answers:

4

What's a good way to implement a Web Page counter?

On the surface this is a simple problem, but it gets problematic when dealing with search engine crawlers and robots, multiple clicks by the same user, refresh clicks.

Specifically what is a good way to ensure links aren't just 'clicked up' by user by repeatedly clicking? IP address? Cookies? Both of these have a few drawbacks (IP Addresses aren't necessarily unique, cookies can be turned off).

Also what is the best way to store the data? Increment a counter individually or store each click as a record in a log table, then summarize occasionally.

Any live experience would be helpful,

+++ Rick ---

A: 

If you get to use PHP, you may use sessions to track activity from particular users. In conjunction with a database, you may track activity from particular IP addresses, which you may assume are the same user.

Use timestamps to limit hits (assume no more than 1 hit per 5 seconds, for example), and to tell when new "visits" to the site occur (if the last hit was over 10 minutes ago, for example).

You may find $_SERVER[] properties that aid you in detecting bots or visitor trends (such as browser usage).

edit: I've tracked hits & visits before, counting a page view as a hit, and +1 to visits when a new session is created. It was fairly reliable (more than reliable enough for the purposes I used it for. Browsers that don't support cookies (and thus, don't support sessions) and users that disable sessions are fairly uncommon nowadays, so I wouldn't worry about it unless there is reason to be excessively accurate.

Zachery Delafosse
IP addresses aren't reliable on a long-term basis
Cameron
Using ASP.NET (MVC) and although Session is an option it isn't going to help with cookie-less access from robots. Plus session has a bit of overhead that this app otherwise wouldn't need.
Rick Strahl
+1  A: 

Use IP Addresses in conjunction with Sessions. Count every new session for an IP address as one hit against your counter. You can store this data in a log database if you think you'll ever need to look through it. This can be useful for calculating when your site gets the most traffic, how much traffic per day, per IP, etc.

Mike Trpcic
A: 

If I were you, I'd give up on my counter being accurate in the first place. Every solution (e.g. cookies, IP addresses, etc.), like you said, tends to be unreliable. So, I think your best bet is to use redundancy in your system: use cookies, "Flash-cookies" (shared objects), IP addresses (perhaps in conjunction with user-agents), and user IDs for people who are logged in.

You could implement some sort of scheme where any unknown client is given a unique ID, which gets stored (hopefully) on the client's machine and re-transmitted with every request. Then you could tie an IP address, user agent, and/or user ID (plus anything else you can think of) to every unique ID and vice-versa. The timestamp and unique ID of every click could be logged in a database table somewhere, and each click (at least, each click to your website) could be let through or denied depending on how recent the last click was for the same unique ID. This is probably reliable enough for short term click-bursts, and long-term it wouldn't matter much anyway (for the click-up problem, not the page counter).

Friendly robots should have their user agent set appropriately and can be checked against a list of known robot user agents (I found one here after a simple Google search) in order to be properly identified and dealt with seperately from real people.

Cameron
Thanks Cameron. This is where I'm at at this point. Point of the question has been to see if there are any better approaches available.
Rick Strahl
+1  A: 

So I played around with this a bit based on the comments here. What I came up with is counting up a counter in a simple field. In my app I have code snippet entities with a Views property.

When a snippet is viewed a method filters out (white list) just what should hopefully be browsers:

        public bool LogSnippetView(string snippetId, string ipAddress, string userAgent)
    {
        if (string.IsNullOrEmpty(userAgent))
            return false;

        userAgent = userAgent.ToLower();

        if (!(userAgent.Contains("mozilla") || !userAgent.StartsWith("safari") ||
            !userAgent.StartsWith("blackberry") || !userAgent.StartsWith("t-mobile") ||
            !userAgent.StartsWith("htc") || !userAgent.StartsWith("opera")))
            return false;

        this.Context.LogSnippetClick(snippetId, IpAddress);
    }

The stored procedure then uses a separate table to temporarily hold the latest views which store the snippet Id, entered date and ip address. Each view is logged and when a new view comes in it's checked to see if the same IP address has accessed this snippet within the last 2 minutes. if so nothing is logged.

If it's a new view the view is logged (again SnippetId, IP, Entered) and the actual Views field is updated on the Snippets table.

If it's not a new view the table is cleaned up with any views logged that are older than 4 minutes. This should result in a minmal number of entries in the View log table at any time.

Here's the stored proc:

ALTER PROCEDURE [dbo].[LogSnippetClick]
-- Add the parameters for the stored procedure here 
@SnippetId AS VARCHAR(MAX),
@IpAddress AS VARCHAR(MAX)

AS BEGIN

SET NOCOUNT ON;

-- check if don't allow updating if this ip address has already 
-- clicked on this snippet in the last 2 minutes
select Id from SnippetClicks 
 WHERE snippetId = @SnippetId AND ipaddress = @IpAddress AND 
    DATEDIFF(minute,  Entered, GETDATE() ) < 2   

 IF @@ROWCOUNT = 0 
 BEGIN     
 INSERT INTO SnippetClicks 
  (SnippetId,IpAddress,Entered) VALUES 
  (@SnippetId,@IpAddress,GETDATE())    
 UPDATE CodeSnippets SET VIEWS = VIEWS + 1 
     WHERE id = @SnippetId
 END
 ELSE
 BEGIN
    -- clean up
 DELETE FROM SnippetClicks WHERE DATEDIFF(minute,Entered,GETDATE()) > 4
 END

END

This seems to work fairly well. As others mentioned this isn't perfect but it looks like it's good enough in initial testing.

Rick Strahl