views:

189

answers:

3

I'm working on a site similar to digg in the respect that users can submit "stories".

I keep track of how many "votes" and "similar adds" each item got. Similar adds are defined as two users adding the same "link".

Here is part of the algorithm (essentially the most important):

y = day number 
sy = number of adds on day y

∑ y[1:10]  sy / y

So basically calculate the number of "similar adds" on a specified day and divide by the number of seconds since the content was posted. Do this for the past 10 days (as an example).

However, I'm not sure how to implement this so that it performs well. Every method I can think of will be really slow.

The only way I can think of implementing this is by calculating the number of adds for the past 10 days for each item submitted will take forever. (so a sql command with a group by date executed 10 times for the past 10 days - obviously this method sucks).

Even if I keep a table that I update once a day (and run the above sql in the background), that will still be ridiculously slow once the database gets large. Plus the rating will be "outdated" since it's not live (e.g. breaking news "items" will never reach the top).

Does anyone have any experience of how to go about doing this?

A: 

There wouldn't be a need to execute SQL 10 times to get the result, you could get it in one execution something like:

select sum(dayval)
from
( select count(*) / (current_date-day+1) dayval
  from votes
  where story_id = 123
  and day >= current_date - 9
  group by (current_date-day+1)
)

(Actual code varies according to DBMS used).

I'm not claiming this would perform well though!

Perhaps a compromise: calculate and store the "start of day" value in a daily batch process, but then add 1 to the stored value for each vote received during the day?

Tony Andrews
You're right, I forgot I can just do it this way in one statement. But it would still be slow. I'm wondering how sites with a complex algorithm handle these problems.
rksprst
+2  A: 

Try this one: everyone has one vote. Your vote sticks to the last thing you voted for. The time thing would come from user behaviour.

This is an interesting idea. You could expand on it too. Maybe give people 10 votes, and when they use them all up, their oldest one gets transferred to the newest "story" they've voted for.
Bill the Lizard
This would accomplish the same task in counting "votes", but it won't give priority to the most recent submissions.
rksprst
+1  A: 

there is a logarithmic weighted average that you could do. the advantage of this is that you only ever need to store the "current value" and the weighted average. In your case the "current value" could be number of votes that day, and you could recalculate the weighted average each night.

const float WeightFactor = 0.70; //for example
float PreviousAverage = GetPreviousAverage();
float CurrentValue = GetVoteCountToday();

float NewAverage = (WeightFactor * CurrentValue) + ( (1-WeightFactor) * PreviousAverage);

This only really works if you have a new value that occurs at a set frequency. If you want to recalculate your vote at arbritary times then this wouldn't work.

frankster