views:

67

answers:

3

I've this blog-like system (LAMP) and I'd like to track number of views of every article. Now, is it better to update the article's views column every time the article is viewed or use some temp table, where I'd store only the article ID, and then (let's say every hour) run a query that would take the data from the temp table and update the rows in the articles table? I'm open to completely different solutions.

Note that I can't use any analytics tools as I need to work with these numbers (most popular etc.).

+2  A: 

Updating the article table each time an article is read will mean more locking of this table (or the rows, depending on the engine you are using).

Using a temp table might be a better solution, in my opinion :

  • either do a raw insert each time an article is viewed, and no update
  • or update a counter per article, in that temp table
  • or (if you are using an engine like InnoDB that supports row locks, and doesn't use table locks) use something like 100 rows per article, and update one of those randomly each time an article is viewed
    • this way, you'll have less concurrency on the locks (if you have 5 users reading the same article at the same exact time, there is not much risks that they'll try to update the same one line amongst 100 ! )
    • just remember that you'll have to sum the values on 100 lines per article, to get the "total total" when you want to count how many times an article has been viewed.

The last solution is probably the best one there is, in terms of concurrency -- once again, if you are using an engine that supports row locks (ie not MyISAM).

And, once in a while, run a cron job that will count from that temp table, and update the article table.

Pascal MARTIN
+2  A: 

Might this be a case of premature optimization? Before going to the extreme of having separate tables and running cron jobs, I would make sure the simple approach is a problem when tuned correctly.

Besides, your problem is the write lock contention, by writing to another table, you just moved that contention to that table, and will have the same blocking.

I would suggest:

  1. make your reads without locks (NOLOCK), and only your writes with locks. So you are only blocking on simultaneous updates of the view count, not reads of the article data.
  2. If that's not good enough, and you can live with some edge case loss of view counts, do the view count update asynchronously, and don't wait for it to return to show the page.

(By edge case loss of view count, I mean instances where the asynch write fails after you delivered the page, because your DB went down just after the article data was read but before the view count was updated)

turnhose
+1  A: 

"The most efficient way" is quite subjective; you'll have to enlighten us to your specific performance problem.

I'd probably have the page views append (in each web server in the farm) to a local log file (atomically, of course) , then have a process which periodically rotates and summarises this into the database (of course handle concurrent access correctly; that is left as an exercise for the reader).

The summariser would count the number of views of each article in the log file for a period (say run this every minute or two) and then do, in a single transaction, however many updates are necessary, one for each article. These would probably not cause too much of a problem as you'd be looking at only one process per web server doing one transaction per minute (or 2, or 5 or however many), rather than one per web request. There would be a lot less load on the database.

MarkR