tags:

views:

155

answers:

4

Hi guys,

I'd like to rank my stories based on "controversy" quotient. For example, reddit.com currently has "controversial" section: http://www.reddit.com/controversial/

When a story has a lot of up and a lot of down votes, it's controversial even though the total score is 0 (for example). How should I calculate this quotient score so that when there's a lot of people voting up and down, I can capture this somehow.

Thanks!!!

Nick

A: 

The easiest method is to count the number of upvote/downvote pairings for a given comment within the timeframe (e.g. 1 week, 48 hours etc), and have comments with the most parings appear first. Anything more complex requires trial-and-error or experimentation on the best algorithm - as always, it varies on the content of the site and how you want it weighted.

Overall, it's not much different than a hotness algorithm, which works by detecting the most upvotes or views within a timeframe.

Raymond Martineau
I believe counting "pairings" is logically equivalent to min(upvotes, downvotes) - and therefore equivalent to Wagner Silveira's answer.It has an equivalent scaling problem (pointed out by Ambush Commander) to Wagner's answer.
Oddthinking
A: 

What about simply getting the smaller of the two values (up or down) of a point in time? If it goes up a lot and goes down a little, or the other way around it, is not controversial.

If for example the items has 10 ups and 5 downs, the "controversiality level" is 5, since there is 5 people disagreeing about liking it or not. On the other hand if it has either 10 ups or 10 downs, the "controversiality level" is 0, since no one is disagreeing.

So in the end the smaller of both items in this case defines the "hotness" or the "controversiality". Does this make sense?

Wagner Silveira
You would still need to scale it somehow: 1000000 to 20 is not more controversial than 10 to 10
Edward Z. Yang
+5  A: 

I would recommend using the standard deviation of the votes.

A controversial vote that's 100% polarised would have equal numbers of -1 and +1 votes, so the mean would be 0 and the stddev would be around 1.0

Conversely a completely consistent set of votes (with no votes in the opposite direction) would have a mean of 1 or -1 and a stddev of 0.0.

Votes that aren't either completely consistent or completely polarised will produce a standard deviation figure between 0 and ~1.0 where that value will indicate the degree of controversy in the vote.

Alnitak
This isn't a bad idea. I remember from my stats days that there are a bunch of statistical methods for analyzing multi-modal distributions specifically, but I couldn't find anything online just now. Probably overkill anwyay.
MusiGenesis
it's pretty easy to calculate - heck, if your votes are in MySQL as +/-1 values you can use its built in stddev() function. In anycase - this is _the_ simplest standard statistical test for the amount of variability in a set
Alnitak
A: 
// figure out if up or down is winning - doesn't matter which
if (up_votes > down_votes)
{
    win_votes = up_votes;
    lose_votes = down_votes;
}
else
{
    win_votes = down_votes;
    lose_votes = up_votes;
}
// losewin_ratio is always <= 1, near 0 if win_votes >> lose_votes
losewin_ratio = lose_votes / win_votes; 
total_votes = up_votes + down_votes;
controversy_score = total_votes * losewin_ratio; // large means controversial

This formula will produce high scores for stories that have a lot of votes and a near 50/50 voting split, and low scores for stories that have either few votes or many votes for one choice.

MusiGenesis