views:

590

answers:

2
+1  Q: 

Weighted Mean

I have an existing web app that allows users to "rate" items based on their difficulty. (0 through 15). Currently, I'm simply taking the average of each user's opinion and presenting the average straight from MySQL. However, it's becoming clear to me (and my users) that weighting the numbers would be more appropriate.

Oddly enough, a few hours of Google-ing hasn't turned up much. I did find two articles that showed site-wide ratings systems based off of "Bayesian filters" (which I partially understand). Here's one example:

The formula is:

WR=(V/(V+M)) * R + (M/(V+M)) * C

Where:

* WR=Weighted Rating (The new rating)
* R=Average Rating (arithmetic mean) so far
* V=Number of ratings given
* M=Minimum number of ratings needed
* C=Arithmetic mean rating across the whole site

I like the idea here of ramping up the weighting based on the total number of votes per item...however, because the difficulty levels on my site can range drastically from item to item, taking "C" (arithmetic mean rating across the whole site) is not valid.

so, a restate of my question:

Using MySQL, PHP, or both, I'm try to get from aritmetic mean:

(5 + 5 + 4)/3 = 4.67 (rounded)

...to a weighted mean:

rating  / weight
5 / 2 (since it was given 2 times)
5 / 2
4 / 1

(sum[(rate * weight)])/(sum of weights)
(5 * 2) + (5 * 2) + (4 * 1) / (2 + 2 + 1)
(24)/(5)
= 4.8
+4  A: 

This is a simple example about how to do it in MySQL directly. You of course would need to add a condition on the subquery to get only the votes for the relevant item instead of all the votes.


mysql> create table votes( vote int);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into votes values (5),(5),(4);
Query OK, 3 row affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from votes;
+------+
| vote |
+------+
|    5 |
|    5 |
|    4 |
+------+
3 rows in set (0.00 sec)

mysql> select vote,count(vote),vote*count(vote) from votes group by vote;
+------+-------------+------------------+
| vote | count(vote) | vote*count(vote) |
+------+-------------+------------------+
|    4 |           1 |                4 |
|    5 |           4 |               20 |
+------+-------------+------------------+
2 rows in set (0.00 sec)

mysql> select sum(vt)/sum(cnt) FROM (select 
count(vote)*count(vote) as cnt,vote*count(vote)*count(vote) 
as vt from votes group by vote) a;
+------------------+
| sum(vt)/sum(cnt) |
+------------------+
|           4.8000 |
+------------------+
1 row in set (0.00 sec)


Vinko Vrsalovic
Isn'nt the exact thing as sum(vote) / count(*) ???5+5+5+5+4 = 24.24 / 5 votes = 4.8
vIceBerg
Indeed. I only translated the formula into SQL :-)
Vinko Vrsalovic
But he is not calculating a simple mean. Remember there were only three votes given in his example.
Dave DuPlantis
True, I got confused, as usually happens. Fixed
Vinko Vrsalovic
Perfect thanks. Final query:select sum(vt)/sum(cnt) FROM (select count(vote) as cnt,vote*count(vote) as vt from votes group by vote) a;(with a dynamic 'where section_id=$id' tacked on).
jmccartie
No, that'll do an arithmetic mean :) See fixed query.
Vinko Vrsalovic
Right, i see the diff now. Thanks.
jmccartie
A: 

What made it clear that weighting would be more appropriate? What are you seeing in an arithmetic mean that isn't helpful to you? I'm curious because it seems like the answer you are seeking might not necessarily meet your needs the best. (Also, a 16-point scale is typically much larger than what most people need; people rarely differentiate between so many points and tend to cluster their responses around a select group of answers.)

The concept you linked to pulls the mean toward the mean for the site; your mean simply pulls itself toward the most common response. Typically if you use a mean and wish to weight the responses, you would do so based on something about the respondents (putting more weight on responses from more knowledgeable people, people who frequent the site more, or other things like that).

You might also consider using calculations other than mean scores, maybe a top-N-box percentage (percentage of respondents giving the top N difficulty ratings).

Otherwise, the formula for your mean is sum(response * count * count) / sum(count * count) ...

select sum(response*ct*ct)/sum(ct*ct) from
( select response, count(response) as ct from your_table group by response) data

Apologies if the syntax isn't exact, I don't have MySQL at work.

Note that you may have to convert the sums from ints to floats; not sure exactly how that works in MySQL. In SQL Server, you have to cast one of the sums so it understands you don't want an integral mean.

Dave DuPlantis