views:

545

answers:

2

On my blog, I display in the right nav the 10 most popular articles in terms of page hits. Here's how I get that:

SELECT *
FROM entries
WHERE is_published = 1
ORDER BY hits DESC, created DESC
LIMIT 10

What I would like to do is show the top 10 in terms of page hits per day. I'm using MySQL. Is there a way I can do this in the database?

BTW, The created field is a datetime.

UPDATE: I think I haven't made myself clear. What I want is for the blog post with 10,000 hits that was posted 1,000 days ago to have the same popularity as the blog post with 10 hits that was posted 1 day ago. In pseudo-code:

ORDER BY hits / days since posting

...where hits is just an int that is incremented each time the blog post is viewed.

OK, here's what I'm going to use:

SELECT *, AVG(
    hits / DATEDIFF(NOW(), created)
) AS avg_hits
FROM entries
WHERE is_published = 1
GROUP BY id
ORDER BY avg_hits DESC, hits DESC, created DESC 
LIMIT 10

Thanks, Stephen! (I love this site...)

+5  A: 

I'm not entirely sure you can by using the table structure you suggest in your query. The only way I can think of is to get the top 10 by way of highest average hits per day. By doing that, your query becomes:

SELECT *, AVG(hits / DATEDIFF(NOW(), created)) as avg_hits
FROM entries
WHERE is_published = 1
GROUP BY id
ORDER BY avg_hits DESC
LIMIT 10

This query assumes your created field is of a DATETIME (or similar) data type.

Stephen
This is the right idea, but wouldn't I want to ORDER BY avg_hits?
Andrew Hedges
Yeah, I've had about 5 answers in the last 5 minutes and I seem to keep forgetting things...
Stephen
There's a typo in your query, btw. You need another closing paren in the select.
Andrew Hedges
Actually, I don't think you would even need the ORDER BY created part at all, since you're looking for the top articles without really taking time into account. Maybe order by title to keep it looking clean?
Stephen
I want them ordered by "popularity", but what I'm finding strange is your query is only returning 1 result. Weird.
Andrew Hedges
Ah, I added the GROUP BY clause and it's all shiny. Thanks!
Andrew Hedges
+1  A: 

I guess you could have a hits_day_count column, which is incremented on each view, and a hits_day_current.

On each page-view, you check if the hits_day_current column is today. If not, reset the hit count.. Then you increment the hits_day_count column, and set hits_day_current to the current datetime.

Pseudo-code:

if article_data['hits_day_current'] == datetime.now():
    article_data['hits_day_count'] ++
else:
    article_data['hits_day'] = 0

article_data['hits_day_current'] = datetime.now()

The obvious problem with this is simple - timezones. The totals get reset at 00:00 wherever the server is located, which may not be useful.

A better solution would be a rolling-24-hour total.. Not quite sure how to do this neatly. The easiest (although not so elegant) way would be to parse your web-server logs periodically. Get the last 24 hours of logs, count the number of requests to each article, and put those numbers in the database.

dbr
To do what you're suggesting would best be done by another language in concert with MySQL. Have an extra field (hits_day) that represents the day of the hits. When adding to the db, first check if that day is today. If so, +1. If not, reset back to 1 and make hits_day equal to today.
Stephen
The only reason I would take this approach would be performance. I could run a cron job to calculate popularity nightly, then I'm back to a simple SELECT. For my purposes, though, it's fine just to calculate this on the fly.
Andrew Hedges