ansaurus

Question

How to rank stories based on "controversy"?

Answer 1

A:

The easiest method is to count the number of upvote/downvote pairings for a given comment within the timeframe (e.g. 1 week, 48 hours etc), and have comments with the most parings appear first. Anything more complex requires trial-and-error or experimentation on the best algorithm - as always, it varies on the content of the site and how you want it weighted.

Overall, it's not much different than a hotness algorithm, which works by detecting the most upvotes or views within a timeframe.

Raymond Martineau 2008-11-16 07:58:34

I believe counting "pairings" is logically equivalent to min(upvotes, downvotes) - and therefore equivalent to Wagner Silveira's answer.It has an equivalent scaling problem (pointed out by Ambush Commander) to Wagner's answer.

Oddthinking 2008-11-16 11:19:44

Answer 2

A:

What about simply getting the smaller of the two values (up or down) of a point in time? If it goes up a lot and goes down a little, or the other way around it, is not controversial.

If for example the items has 10 ups and 5 downs, the "controversiality level" is 5, since there is 5 people disagreeing about liking it or not. On the other hand if it has either 10 ups or 10 downs, the "controversiality level" is 0, since no one is disagreeing.

So in the end the smaller of both items in this case defines the "hotness" or the "controversiality". Does this make sense?

Wagner Silveira 2008-11-16 08:06:32

You would still need to scale it somehow: 1000000 to 20 is not more controversial than 10 to 10

Edward Z. Yang 2008-11-16 08:12:40

Answer 3

+5 A:

I would recommend using the standard deviation of the votes.

A controversial vote that's 100% polarised would have equal numbers of -1 and +1 votes, so the mean would be 0 and the stddev would be around 1.0

Conversely a completely consistent set of votes (with no votes in the opposite direction) would have a mean of 1 or -1 and a stddev of 0.0.

Votes that aren't either completely consistent or completely polarised will produce a standard deviation figure between 0 and ~1.0 where that value will indicate the degree of controversy in the vote.

Alnitak 2008-11-16 09:36:50

This isn't a bad idea. I remember from my stats days that there are a bunch of statistical methods for analyzing multi-modal distributions specifically, but I couldn't find anything online just now. Probably overkill anwyay.

MusiGenesis 2008-11-16 12:16:30

it's pretty easy to calculate - heck, if your votes are in MySQL as +/-1 values you can use its built in stddev() function. In anycase - this is _the_ simplest standard statistical test for the amount of variability in a set

Alnitak 2008-11-16 13:11:02

Answer 4

A:

// figure out if up or down is winning - doesn't matter which
if (up_votes > down_votes)
{
    win_votes = up_votes;
    lose_votes = down_votes;
}
else
{
    win_votes = down_votes;
    lose_votes = up_votes;
}
// losewin_ratio is always <= 1, near 0 if win_votes >> lose_votes
losewin_ratio = lose_votes / win_votes; 
total_votes = up_votes + down_votes;
controversy_score = total_votes * losewin_ratio; // large means controversial

This formula will produce high scores for stories that have a lot of votes and a near 50/50 voting split, and low scores for stories that have either few votes or many votes for one choice.

MusiGenesis 2008-11-16 12:41:54

ansaurus

tags:

views:

answers:

How to rank stories based on "controversy"?

related questions