views:

65

answers:

2

I'm sorry if it's not an appropriate question for this site, and if it's necesary I'll close this question. But maybe someone could give me an ideea:

I'm trying to find a more complex index to make an hierarchy. For example:

5 votes from 6 = 83% AND

500 votes from 600 = 83%;

10 votes from 600 = 1.66%

If I make a hierarchy with the %, first two will be on the same place, but I think that 83% from 600 it's more valuable than the first one.

I could compare 5, 10, 500, but again it's not fair because the third case (10 votes) will be in front of the first case (5 votes), wich it's not fair beacuse the third case has only 1.66%

Maybe someone could give me an ideea how to give more weight for the second case but in the same time let the let the new entries have a fair chance.

A: 

Compare percentages and when they are equal (or very close to equal), resolve the draw by comparing vote counts.

Kaniu
I don't think it will work because for example if I accept a diference let's say of 5% than 490 votes from 601 = 81,5% so it's in the limit of 5% and if I'll make an hierarchy whith the total number of votes this last case will be in front of the 500 from 600 = 83%
How about comparing 490 to 500? Don't compare the total number of the votes, just the ones you are counting the percentage from.
Kaniu
I thought at this also, but I dpn't think it will work because if we replace the prev example with 501 from 610 = 82% and again it will be higher than the second case 500 from 600 = 83%.
You could compare votes only when the percentages are equal. Or use some other threshold than a constant one. Perhaps it could change based on the number of total votes-
Kaniu
+2  A: 

This is a standard problem that calls for a Bayesian solution. You are interested in the posterior mean of the proportion of votes to observations.

The simplest approach is to model the votes as coming from a Binomial distribution and specify a conjugate Beta prior with parameters alpha and beta. This leads to a posterior mean = (votes + alpha) / (n + alpha + beta). You can see how larger alpha and beta smooth the average towards a common mean.

A better approach would be to set up a hierarchical model and estimate alpha and beta from the data. Matching moments will probably work well, although it is not fully Bayesian. This problem is isomorphic with the rats example in Gelman et al. (2003); Bolstad (2004) also has a chapter on the Binomial model. See here, here, and here.

Tristan
@Tristan Thank you for suggestions. I looked at them, but the problem is that this query will have to be made maybe every 10 sec, so I want to be very, very fast also. I thinking to someting like y/t*100*0.3 + y/T*100*0.7 wherey = an element from serie; t=total number of votes for y (poz and negatives)T=total number of pozitives votes for all elementsAnd then I gave for the first index a weight of 30% (0.3) and 70% (0.7) for the second.What do you think?
in a simulation with 12 elements with very disapated values The result will be:case 1 : 25.02; case 2 : 27.40;case 3 : 0.54;case 4 (5000 from 6000 = 83%) : 49.04
If you want something fast and easy, just fix your alpha and beta at something reasonable and compute p = (alpha + votes) / (alpha + beta + n). If you have no data, p = alpha/(alpha + beta). With tons of data, p=votes/n since alpha and beta are small. You can think of alpha as fake votes and (alpha + beta) as the fake total. Look at the Beta distribution and choose sensible alpha and beta that matches what you think or observe for the distribution of p. You can try other ideas, but this is really the correct statistical approach.
Tristan
This is a simple program that you may find helpful http://www.epi.ucdavis.edu/diagnostictests/betabuster.html
gd047