tags:

views:

44

answers:

2

Introduction

I am wondering what is the best way to implement a unique views system... I think I know how, so if I could explain how I think to do it and you point out any mistakes or improvements.

obviously I will have to store a log table containing a video id and something which (relatively) uniquely identifies the user. At first I considered a combination of header request and IP but decided to keep it simple and use just IP. Also that way a user can not increase the views of their video by using a different browser.

This is how I would think to do it:

  1. When a user visits I do a SELECT similar to this:

    SELECT 1 FROM tbl_log WHERE IP = $usersip AND video_id = $video_id

  2. if there is no result then I must insert a record

    INSERT into tbl_log (IP,video_id) VALUES ($usersip, $video_id)

  3. and increase the views by 1

    SELECT views FROM tbl_video WHERE video_id = $video_id

    UPDATE tbl_video SET views = $result['views'] + 1 WHERE video_id = $video_id

Questions

  • I guess I do not want to have
    millions of log records slowing down my site so should I run a cron job to empty the log table once a day?

  • Should I make the views
    transactional? (I guess a slightly
    depreciated view count is less
    important than a slow site because of row locks)

  • Is there a way to reduce the load on the mysql server.... I fear if every view requires an increased view count and an IP log that it will be pretty expensive. I have seen that youtube
    and the like do not update the views instantly... do they cache the
    updates some how and then run them at once? if so how?

  • How efficient is my system? Can you think of any improvements?

+1  A: 

Here are some ideas for improvements you can make.

Set a primary key on tbl_log to be IP + video_id. Then you can simply do a

 REPLACE INTO tbl_log (IP,video_id) VALUES ($usersip, $video_id)

(Be sure to escape those php variables to avoid SQL injection.)

Now you're updating your log table with only one query. Next, you can update the views field in tbl_video periodically with something like:

UPDATE tbl_video SET views = (select count(*) from tbl_log where video_id = $video_id) where video_id = $video_id

You can do that with a cron job, or you can add a 'last_count_update' field and update the video when it is accessed if the last count time is older than 2 hours or whatever. This will be a little less work if you have a bunch of videos that aren't visited often.

Scott Saunders
Both your suggestions make a lot of sense, only problem I see with the second one is that it does not clear the log... so every time it is executed it will count the results regardless of if they have already been counted. I guess I have two options... have the life of the log equal to when the views update (by having the log emptied when the views are counted) OR by adding a counted field to stop a log being counted multiple times. What do you think?
Mark
Yes, it counts the results regardless of whether they've been counted. But it doesn't ADD that count to the existing count - it replaces the existing count.
Scott Saunders
I don't see any way to guarantee unique views if you're clearing the log table. You'll either have to live with a big table or change your definition of unique. If, for example, you decide that unique means per IP and per day or week, then you can clear the table of any records from the previous day/week (or older) after you count them. But then you would have to deal with adding them rather than replacing the count. Then you would need something like the counted field you mentioned.
Scott Saunders
true... but do I really want eternal records to ensure unique views? Surely every day that goes by the check to see if the user has visited before and the count would take longer and longer to execute?
Mark
In my eyes, the unique view is less about counting individual people (which as many users are behind a router at work/school is incorrect anyway) and more about stopping people repeatedly pressing refresh to increase their view count....
Mark
+1  A: 

I guess I do not want to have millions of log records slowing down my site so should I run a cron job to empty the log table once a day?

Consider using mysql's ON DUPLICATE KEY UPDATE syntax to avoid using a SELECT which would have an expensive WHERE clause. If your log table also had a timestamp column, you could refresh that value.

INSERT into tbl_log (IP,video_id) VALUES ($usersip, $video_id) ON DUPLICATE KEY UPDATE time_recorded = now();

This would require you to have a UNIQUE constraint on the IP and video_id columns.

Should I make the views transactional? (I guess a slightly depreciated view count is less important than a slow site because of row locks)

No, because you can achieve this with a single UDPATE query.

UPDATE tbl_video SET views = views + 1 WHERE video_id = $video_id

Is there a way to reduce the load on the mysql server.... I fear if every view requires an increased view count and an IP log that it will be pretty expensive. I have seen that youtube and the like do not update the views instantly... do they cache the updates some how and then run them at once? if so how?

It's not too bad - there's really no other way to reliably capture record-view data. In the case of Youtube, it's more likely delayed writes or replication that's causing the delay you notice since they have hundreds of servers (although it's possible they are caching the value as well)

How efficient is my system? Can you think of any improvements?

Other than what I mentioned here already, not off the top of my head.

Peter Bailey
sorry I do not understand your first suggestion... how does having an ON DUPLICATE KEY UPDATE mean I do not have to empty the log? Do you mean it will be faster to execute the SQL because there is no WHERE and thus I do not have to empty the log?
Mark
Yes, that is correct.
Peter Bailey