ansaurus

Question

Answer 1

+1 A:

I think that the problem you may be seeing with your current implementation is that topics that were hot 23 hours ago are influencing your rankings right now. The problem I see with your new proposed implementation is that you're wiping the slate clean at midnight, so topics that were hot late last night won't seem hot early the next morning (but they should).

I suggest you look into implementing a Digg-style algorithm (sorry for linking to Digg) where the hotness of a topic decays with age. You could do this by counting up the hits/hour for each of the last 24 hour periods then divide each period-score by how many hours ago the period took place. Add up the 24 periods to get the score.

hottness = (score24 / 24) + (score23 / 23) + ... + (score2 / 2) + score1

Where score24 is the number of "hits" that a topic got in the one-hour period that occured 24 hours ago (maybe not the hits exactly, but the normalized score for that hour).

This way topics that were hot 24 hours ago will still be counted in your algorithm, but not as heavily as topics that were hot an hour ago.

Bill the Lizard 2009-06-16 20:10:18

Thank you, Bill the Lizard, for this tip. I didn't know this simple algorithm but it's really cool. Unfortunately, it isn't suitable for my purpose, i.e. finding trending topics. My algorithm filters the topics out which are always hot. Your algorithm doesn't to that, does it? ;) But it's very useful for me, though, because I filter out trending links, too. For this purpose, it's useful.But your example concerning my algorithm and the time periods is very good. So do you recommend the first approach (simply going 24h back instead of starting at 0:00)?

2009-06-17 17:15:10

After going back and re-reading the question you linked to, I see the problem with this suggestion. You're right, it doesn't filter out topics that are always hot. Digg and reddit work with this algorithm because it only applies to a single link, not an entire topic, which might be represented by many hits. Of your two choices, I would favor going back 24 hours, only because I can't imagine how the system will work at 1AM if you only go back to 0:00. Maybe you could split the difference (in a way) and only go back 12 hours?

Bill the Lizard 2009-06-17 18:18:50

Yes, the second approach would probably fail if some topics are hot shortly before 0:00. But the disadvantage is that I can't store the data of the last days when I always go back 24h ...

2009-06-17 18:46:44

ansaurus

tags:

views:

answers:

Time frames for Standard score

related questions