views:

201

answers:

3

I want to implement a user-facing view counter (similar to what SO has for question views) that tracks the number of unique views to a page. There are a couple of similar questions here but none seem to answer my question fully.

What would be the best setup for this (in terms of database tables etc.)? Would it be good to add a 'views' column to the 'questions' table and simply increment this upon every page view? And if I want the views to be unique, I guess I could have another table with question id's and IP addresses and only increment the 'view' column if there isn't already an entry with the current IP. However this 'ip-view' table would get enourmous really quickly...Mainly I am concerned with the overhead of having to store every page view and every IP in a table.

How could this be optimized so that it doesn't become a performance bottleneck? Is there a better approach than what I described? Please note that it is very important for me that only unique views are counted.

Update: in addition to suggesting implementation methods, I'd also like to further understand where the performance issues come into play assuming the naive approach of simply checking if the IP exists and updating the 'view' column on every page view. Is the main issue vast amount of insertions occuring (assuming heavy traffic) or is it more the size of the object-to-ip mapping table (which could be huge since a new row will be inserted per question for each new unique visitor). Should race conditions be considered (I just assumed that an update/increment sql statement was atomic)? Sorry for all the questions but I am just lost as to how I should approach this.

A: 

Check my answer here, possibly: http://stackoverflow.com/questions/1269968/incremented-db-field/1269973#1269973

Noon Silk
A: 

There seem to be a revolutionary approach (over the top of my head), which I myself isn't sure of yet about being scalable or rather feasible.

If you really wish to store the IP in DB and wanted to avoid getting ur DB clogged up, you should think of storing them in a hierarchical order.

<ID, IP_PART, LEVEL, PARENT_PART, VIEWS>

so, when a user visits ur website from IP 212.121.139.54, the rows in ur table would be:

<1, 212, 1, 0, 0> <2, 121, 2, 1, 0> <3, 139, 3, 2, 0> <4, 54, 4, 3, 1>

Points to Note:

  1. Only rows with LEVEL val=4, will have the view count.
  2. To avoid redundancy of storing VIEWS val=0, for LEVEL val=1,2,3; you can think of storing them in a different table.
  3. The idea, as it has conceived, doesn't seem suitable for a small set of IPs.
  4. Though this may have neglected the fact that a public proxy IP sitting in front of a private network accessing ur website from more than one box. But that doesn't seem to be ur ques. i guess.

so, chao, let me know what did u implement?

a6hi5h3k
+2  A: 

If you need to track unique views specifically, there's probably two ways to do this... unless you're operating with internal users that you can identify. Now, in order to do this you need to keep track of every user that's visited the page.

Tracking can be done either server-side or client-side.

Server-side will need to be IP addresses, unless you're dealing with internal users that you can identify. And whenever you deal with IP addresses all the usual caveats about using them to identify people apply (there could be multiple users per IP, or multiple IPs per user) and you can't do anything about that.

You should also consider that the "huge IP table of death" isn't that bad of a solution. Performance will only become an issue if you have hundreds of thousands of users... assuming it's indexed properly, of course.

Client-side probably involves you leaving an "I've visited!" cookie. If the cookie is NOT present, then increment your user count. If the cookie cannot be created, you'll have to live with an inflated user view. And all the caveats about dealing with cookies apply... which is to say, they'll go bad eventually and disappear.

Richard Seviora