views:

122

answers:

5

What's a good metric for finding the most active forum thread or game in your database?

Imagine you run a forum like 4chan. You want the most active threads to appear on the first page. You tried sorting topics by last_updated, but the result is chaotic: the threads you see on each refresh are effectively random, and jumping to the second page may show you many of the same results. There must be a more stable algorithm for determining active threads!

Imagine you run a website where people can play and watch games. You want people to see how exciting these games can be the moment they visit your front page. Interacting in your game can be boiled down to generating individual events. But you can't just sort by last_updated because some people play very slowly, and you want to find games that are exciting.

For bonus points, think about how you'd construct a SQL query for maximum activity, or how you could implement this in a server-side cache. Best answers do not require a cron job to preen the data.

A: 

Surely you cannot count on last_updated(it self), you should use reply_count/play_count, view_count/played_count for all time active threads. and you may need to add a field like now_playing_count for each game to determine most hot game now.

static
A: 

A problem somewhat related to yours is called the "The Britney Spears Problem" which is about the difficulty of determining hot topics algorithmically. In an AI point of view, it is a difficult problem because first of all, there're no fixed number of topics, so classification is out of the question. And since trends changes from time to time, the model need to take time into factor (typical Neural Network doesn't, unless you're talking about Time Delay Neural Network). Finally, what's hot and not is subjective and differs from person to person, which means you may need to take person's past interests into account (Collaborative Filtering).

Hao Wooi Lim
I'm just looking for a measure of activity: that is, lots of recent events.
Nick Retallack
+1  A: 

In the forum example the hotest threads are based on comments posted and so you just count the number of comments posted in the current day/week/month (whatever time frame you decided constitutes 'hot') and order the threads based on this.

SELECT p.id, p.title, COUNT(c.created_at) as count
FROM posts p, comments c
WHERE p.id = c.post_id
AND c.created_at > ***TIME YOU DETERMINE AS HOT***
GROUP BY p.id, p.title
ORDER BY count DESC

Your games scenario would be the same assuming you have similar table setup for those data models

** note anything that you put in the select has to be in the group by statement also **

ErsatzRyan
A: 

Psuedo code:

Select id, count Group by topic and date order by count

Omar Kooheji
A: 

You tried sorting topics by last_updated, but the result is chaotic: the threads you see on each refresh are effectively random, and jumping to the second page may show you many of the same results.

You can remember exact time the user clicked on the first page, and order by last_updated that is less or equal to that date:

SELECT  t.id, t.name, p.last_updated
FROM    threads t
JOIN    posts p
ON      p.thread_id = t.id
        AND p.last_updated <= @last_updated
ORDER BY
        p.last_updated DESC

This will give you a stable resultset.

Update the variable only when user refreshes the front page (and not clicks on page 1, page 2 etc.)

Quassnoi