views:

153

answers:

3

I'm building a website with user generated content. On the home page I want to show a list of all created items, and I want to be able to sort them by a view counter. That's sound easy, but I want multiple counters. I want to know which was the most visited item in the last day, last week or last months or overall.

My first Idea was to create 4 counter columns in the item's DB-Table. One for each of daily, weekly, monthly and overall, and the create a cron job, that clears the daily counter every 24 hours, the weekly counter every 7 days and so on.

But my problem with this is, what happens if I want to know which was the most viewed item of the week, just after the weekly counter got cleared?

What I need is an efficient way to create a continous counter, which got reduced for every page view that is too old, and increased for every new page view.

Right now I'm thinking of a solution with the redis server, but I don't have any solution yet.

I'm just looking for a general idea here, but FYI I'm developing this application in Ruby on Rails.

+1  A: 

What I would suggest to tracking hits on each page, with just a timestamp and any user id or what ever you can store, then you can calculate the counters how you like and change it later because you have the data in a simple to use format. A table with entity(page) userid and timestamp should be good. Just add to that when ever the page is requested.

To reduce the number of inserts you can batch them together in your software. Constructing a multi row insert like here for MySQL will save overhead. You will just need your classes to construct the insert as described and store up until the insert. One idea is to not only time it but fix a number of rows to batch allowing you to say at most if the server goes does you only lose x number of rows of hits.

There is a MySQL trigger that you can cause to run only after the full batch of inserts is done which you could use to update reporting tables so you don't have to constantly hit the main hit tracking table.

Also if this does need to be really high throughput it could be broken off into its own shard and accessed via Ajax calls to do the hit tracking and get the counts.

Jeff Beck
I thought of that too, but this means I need an extra table with a hopefully large ammount of write access, that's exactly what I am trying to avoid here.
jigfox
You can always do batching or the writes to this table call a track hit function that only inserts in batch every x min. You are giving up realtime results but that may be fine. If you do have a hit table you batch insert into you can create triggers that populate a report table of the counts as needed.
Jeff Beck
I've never seen seen this. I'm using a MySQL DB, have you any links to a site with an example for batching writes to the DB?
jigfox
I have edited my answer with some of the info we have been discussing and a link let me know if you need any more info.
Jeff Beck
I see what you meant here now, I never thought of this. It seems to be a good idea, but I still have to store all the values for the batch inserting, but therefore I could use the redis server. I will definitely look into this, and then I will let you know how it works
jigfox
If you are going that route push it all into it's own application separated from your main that way you can use two sets of hardware and have no interdependencies.
Jeff Beck
+1  A: 

What you can do is store what is visited and at what date(timestamp) and you would do this for each and every time the something is visited. When you want to retrieve what was visitied, you would chose the ones within a range of dates(timestamp) and add them together.

OR

Each of your links will have their own respective row for every day they are visited. If they are visited more than once in one day then it will add +1 to the already existing value.

You could use the second example and do weekly storage instead of daily if your weeks are something like sunday-saturday and not 7 days before today. So replace the words day with the word week thus having smaller amounts of data storage.

I see that you want to avoid huge data stores but to the extent that you want this to work I don't see much better than this.You could use the second example and do weekly storage instead of daily if your weeks are something like sunday-saturday and not 7 days before today.

Here is a possibility for regis server after looking at some documentation.

SET link_id|date => "visit_count"

This stores the link_id or whatever you call it along with the date seperated by | or what ever character you want to use. In that key-value you store the visit_count.

Say that you want to add a hit to that link on this date. You would GET link_id|date and then add +1 to the visit_count it returns and then store it back the way I shown above.

If you want to get the amount of hits for a specific date then you may with the GET link_id|date again.

You would just use ruby-on-rails to replace link_id, date and visit_count with the appropriate values.

Hope this helps you out.

Jonathan Czitkovics
That's a nice idea, but do you know an elegant way to do this with a redis server or something similar, because I don't want to write every page view to the database.
jigfox
Unfortunately I never used redis server so I could not tell you how to do it there. Al tho I added an other idea.
Jonathan Czitkovics
@Jens Fahnenbruck I did some research and found some information on redis server for you and updated my answer.
Jonathan Czitkovics
retrieving the value add +1 and store back is very bad, but that's the real reason I want to user redis, because redis has an atomic comand to increment a counter value "INCR counter_key"
jigfox
A: 

What you could do is create a table called ViewCounters.

There would be a 'pageId' column, a 'day' column, and a 'views' column. The pageId would correspond to the page being viewed, and the 'day' would correspond to the day it was viewed on.

Each time a page is viewed, it finds (or creates if there is not yet one) the row in the ViewCounters table with its pageId, and with the current day. It would then increment the 'views' column for that row.

This gives you maximum flexibility with regards to creating new counters and accessing history. Furthermore, view counts for monthly, weekly, and daily counters are all very cheap to generate.

Another great benefit to this would be that it's only row per page per day, which isn't really that bad.

Cam
same problem like counter columns in the countable objects. I want to reduce write access to the DB drastically.
jigfox
Not sure I understand what the problem is. How can you get any better than writing to one row (which is trivially found) per pageview? Edit: Oh - you want to cache then batch write. Makes sense. Well one option would be to save the pageviews to files (name like [pageid].data or something). Each day, have a cron job batch write that stuff to the database, and then delete the files. Then you only write to the DB once per day.
Cam
yes, I want to cache and than batch write, but I think writing to a file would be even more expensive the a DB-Update
jigfox
Yeah exactly. Well what else are you going to do? I mean that's why you have a database. Mind me asking why you can't even write once per pageview? DB writing is extremely fast.
Cam
Why don't you test both filewriting and db access. See how long they take. You might be surprised. Or, you might be right (depending on your application's scope), and we'll keep looking :)
Cam