views:

76

answers:

4

Hi there,

I have a site where the users can view quite a large number of posts. Every time this is done I run a query similar to UPDATE table SET views=views+1 WHERE id = ?. However, there are a number of disadvantages to this approach:

  • There is no way of tracking when the pageviews occur - they are simply incremented.
  • Updating the table that often will, as far as I understand it, clear the MySQL cache of the row, thus making the next SELECT of that row slower.

Therefore I consider employing an approach where I create a table, say:
object_views { object_id, year, month, day, views }, so that each object has one row pr. day in this table. I would then periodically update the views column in the objects table so that I wouldn't have to do expensive joins all the time.

This is the simplest solution I can think of, and it seems that it is also the one with the least performance impact. Do you agree?

(The site is build on PHP 5.2, symfony 1.4 and Doctrine 1.2 in case you wonder)

Edit:
The purpose is not web analytics - I know how to do that, and that is already in place. There are two purposes:

  • Allow the user to see how many times a given object has been shown, for example today or yesterday.
  • Allow the moderators of the site to see simple view statistics without going into Google Analytics, Omniture or whatever solution. Furthermore, the results in the backend must be realtime, a feature wich GA cannot offer at this time. I do not wish to use the Analytics API to retrieve the usage data (not realtime, GA requires javascript).
A: 

First just a quick remark why not aggregate the year,month,day in DATETIME, it would make more sense in my mind.

Also I am not really sure what is the exact reason you are doing that, if it's for a marketing/web stats purpose you have better to use tool made for that purpose.

Now there is two big family of tool capable to give you an idea of your website access statistics, log based one (awstats is probably the most popular), ajax/1pixel image based one (google analytics would be the most popular).

If you prefer to build your own stats database you can probably manage to build a log parser easily using PHP. If you find parsing apache logs (or IIS logs) too much a burden, you would probably make your application ouput some custom logs formated in a simpler way.

Also one other possible solution is to use memcached, the daemon provide some kind of counter that you can increment. You can log view there and have a script collecting the result everyday.

RageZ
I already use Google Analytics and clicktale, so I am well covered on the web analytics part. I see your point regarding the `DATE` field, but calculations regarding 'how many views did I have in November' would, as far as I know, be faster if you simply have 3 integer fields instead. Parsing logs is definately not an option that I'd go for.
phidah
For logging I usually use separate year, month, day, hour columns because it makes generating stats much easier. `GROUP BY hour, day, month, year` for hourly stats, `GROUP BY month, year` for monthly stats, etc.
Rob
I updated the question to better reflect what I am looking for.
phidah
A: 

Along the same lines as Rage, you simply are not going to get the same results doing it yourself when there are a million third party log tools out there. If you are tracking on a daily basis, then a basic program such as webtrends is perfectly capable of tracking the hits especially if your URL contains the ID's of the items you want to track... I can't stress this enough, it's all about the URL when it comes to these tools (Wordpress for example allows lots of different URL constructs)

Now, if you are looking into "impression" tracking then it's another ball game because you are probably tracking each object, the page, the user, and possibly a weighted value based upon location on the page. If this is the case you can keep your performance up by hosting the tracking on another server where you can fire and forget. In the past I worked this using SQL updating against the ID and a string version of the date... that way when the date changes from 20091125 to 20091126 it's a simple query without the overhead of let's say a datediff function.

UJ
I updated the question to better reflect what I am looking for.
phidah
A: 

If you're going to do that, why not just log each access? MySQL can cache inserts in continuous tables quite well, so there shouldn't be a notable slowdown due to the insert. You can always run Show Profiles to see what the performance penalty actually is.

On the datetime issue, you can always use GROUP BY MONTH( accessed_at ) , YEAR( accessed_at) or WHERE MONTH(accessed_at) = 11 AND YEAR(accessed_at) = 2009.

jdswift
That would result in a huge amount of rows, then I'd prefer to aggregate the data a bit before storing it.
phidah
A: 
Georgi