views:

463

answers:

3

So I just built a star-rating system and and trying to come up with an algorithm to list the "Top Rated" items. For simplicity, here are the columns:

item_name
average_rating (a decimal from 1 to 5)
num_votes

I'm trying to determine the "sweet spot" between number of votes and rating. For example...

  • An item rated (4.6 / 20 votes) should be higher on the list than an item that's (5.0 / 2 votes)
  • An item rated (2.5 / 100 votes) should be below an item that's (4.5 / 2 votes)

So in other words, num_votes plays a factor in what's "Top".

Anyone know of an algorithm that is pretty good at determining this "sweet spot"?

Thanks in advance.

+2  A: 

The question is, how much higher the 4.6/20 shall be rated than the 5.0/2...

An idea not to take items in consideration that do not have at least x votes.

Another idea is to fill up with "medium" votes. Decide that 10votes shall be the minimum. The 5.0/2 must be filled with 8 virtual votes of 2.5

5.0/2 means 2 votes with 5.0, add 8 with 2.5 you'll get 30/10 -> 3.0 ;)

Now, you have to decide how many votes an item shall at least have. For those that already have the minimum votes, a direct comparation shall be done.

4.5/20 > 4.4/100
5.0/2  < 3.1/20  (as 5.0/2 is, as we calculated, 3.0/10)
tuergeist
What about 4/20 and 4/1000. Wouldn't 4/20=0.2 and 4/1000=0.004
andho
4/1000 means avg vote of 4 with 1000 votes not 4 divided by 1000 :|
tuergeist
+5  A: 

here's another, statistically sound good way: http://www.thebroth.com/blog/118/bayesian-rating

longneck
To complement this, there's this option as well, that's a bit more intense:http://www.evanmiller.org/how-not-to-sort-by-average-rating.html Bayesian rating is probably much better though. It's an interesting other approach though.
brianreavis
This solution is good, but it has the disadvantage that you need to know the average number of votes and ratings! That means more[!] MySQL queries for each rating calculation.
tuergeist
that evanmiller.org page is the one i was actually looking for as that is also an excellent algorithm. i couldn't look it up at work because for some reason it's blocked by the content filter.
longneck
+1  A: 

How about you give each 10 votes a weight of 1 so 20 votes gives the item 2 weight. Then if the item has 0 weight it will loose 0.5 from the average

4.6/20 = 20/10: 2 weight
5.0/2 = 2/10: 0 weight

(4.6 * 0.02) + 4.6 = 4.692
(5.0 * 0.00) + 5.0 = 5 - 0.5 = 4.5

2.5/100 = 100/10: 10 weight
4.5/2 = 2/10: 0 weight

(2.5 * 0.1) + 2.5 = 2.75
(4.5 * 0.0) + 4.5 = 4.5 - 0.5 = 4
andho