views:

939

answers:

12

I have a game in which you can score from -40 to +40 on each match. Users are allowed to play any number of matches. I want to calculate a total score that implicitly takes into account the number of matches played.

Calculating only the average is not fair. For example, if Peter plays four games and gets 40 points on each match, he will have the same total score as Janne who played only one match with 40 points.

Adding up the match scores isn't fair either. Peter plays 2 games (40 points on each match), total score 80. Janne plays 8 games (10 points on each match), total score 80.

Is there a (simple) and fair way to calculate the total score? I have read about the Elo & Glicko system for chess ratings, but both are based upon a players rating history and the opponents rating.

+1  A: 

Make the formula non-linear with regard to the number of games.

Let G be the number of games, and S the sum of all games scores, then, TotalScore = G^2 * S

Play around with it until you find something that seems logical.

Roy
A: 

It depends how much you want to weight games played compared to the scores. You could define a function that returned a games played weight: some smallish fraction for only one game and 1 for a lot of games (e.g. 1 - 1/(2 * #Games)) and multiple that by the cumulative score.

Ian G
+1  A: 

You could check for win runs and give bonus points to consecutive wins (+5, +10, +15...), so (-10,+10,+10,+10,-10,+10) would give (-10,+10,+15,+20,-10,+10). You can also do this without caring about runs, this would give (-10,+10,+15,+20,-10,+25).

Another possibility would be to set the bonus value to 0 at the beginning and decrement it by 5 if the player loses, and increment it by 5 if the player wins.

schnaader
+1  A: 

I think there is no good way to create a score like this in a single number.

I would suggest to calculate the average success and include the number of games. For example

  • Peter scores 40/2 (average 40 pts in 2 games)
  • Janne scores 10/8 (average 10 pts in 8 games)

You can quickly see if the second number is larger, the first number is more accurate.

Otherwise use ELO, but is is only accurate if each player plays at least 10 matches.

Wimmel
+1  A: 

You could set the score to be the average of the player's best 10 games from the last 30 (or some other numbers - maybe just the last 10 would suit you).

Players who haven't played 10 games yet could take the average of the games they have played, but then weight it towards 0, to compensate for the fact that the average of n < 10 has a higher standard deviation than the average of 10. Not sure what the scaling factor should be for each n, but if you have some past data to look at, you can figure out how variable the typical player's scores are and work from that.

Or work out what the global average score per game is (perhaps 0), and add (10-n) fake scores of that amount when calculating the score for a player with fewer than 10 games.

Steve Jessop
+2  A: 

You can look into Microsofts TruSkill, I read about it a few months back and I've honestly forgotten about most of the details so I'm not sure it's super appropriate, but it might be a good inspiration.

grapefrukt
A: 

Another starting point could be the wikipedia article on the ELO chess ranking system

Harald Scheirich
+3  A: 

Another approach would be to use Bayesian statistics. Model the probability of each team winning as a beta distribution and calculate the probability of a sample from one distribution being larger than a sample from the other. This approach is used to test cancer drugs. It takes into account not only which drug has a better response rate but also which drug has more data. Comparing two players or two teams is exactly analogous.

This may sound more complicated than it is, but there is free software for doing these calculations, and in some cases the calculations are easy enough to do by hand.

See an introduction to random inequalities and specifics about beta distribution inequalities.

John D. Cook
This is really smart. But how do you turn this into a simple metric, ie, a number for each player that gives their score?
dreeves
+2  A: 

It depends what you want to accentuate, but I think this is both simple and effective:

average score + games played

You could weight the variables a bit (e.g. 2* games played, if you want to have more of an impact) - but the basic relationship seems reasonable.

In your first example Peter would have 44 and Jane would have 40 - but if Peter starting losing points Jane could catch up.

Phil Nash
Thanks Phil, this is exactly what I was looking for. The best total score should be given to the player having the highest point sum with the lowest game count.
Heh. Maybe SO should work that way too.Accepted answer, but no votes. Only on SO, eh ;-)
Phil Nash
I feel like this is too arbitrary, having to pull those weights out of thin air. I'm partial to my perhaps overly mathematical method: using the lower end of a mean confidence interval. It's like a pessimistic estimate of what the true average will be after enough games.
dreeves
Also one problem to this method is that it has a sort of ceiling... If an user played several hundred games, then that average score would not have enough influence, as the average score is only 0-40 but the games played is 1-infinite. I think this solution only delays the problem, and works well in the short-term but not in the long term. It puts too much weight on the amount of games played.
Ricket
Sorry, however, I don't really know what solution to suggest instead. I'm aware I'm kind of being anti-helpful but just wanted to point it out as a consideration.Also I just realized this question is extremely old. Sorry about that too.
Ricket
Apology accepted. SO is all about the "long tail", so a question is only *old* if it's no longer relevant. You raise a very valid issue. This technique is only useful for "small numbers" or iterations - for some value of "small numbers". IIRC there were some answers here that did better, although at the cost of some extra complexity - dreeves' (who also commented here) one springs to mind
Phil Nash
+1  A: 

I recommend making the Game Score be the lower end of a 95% confidence interval. In the limit as you play a lot of games your Game Score approaches your average score though is always strictly less. It's like using average score but being appropriately skeptical of people who only played a few games and may have just been lucky.

Said another way, it's a pessimistic estimate of what the true average will be after enough games are played.

How to compute the 95% confidence interval without storing the entire list of scores: http://stackoverflow.com/questions/282600

Alternatively, if you keep track of the number of games played, the sum of the person's scores, and the sum of the squares of their scores, you can compute the standard error as follows:

SE = sqrt((ss - s^2/n) / (n-1) / n)

Instead of bothering with the 95% CI, you could just let the Game Score be:

s/n - SE

Note that the above is negative infinity when only one game has been played. That implies you'd give someone who's played only one game the lowest possible score as their Game Score.

Another idea is to explicitly show the confidence interval when ranking people (sorted by the low end). Then people would be playing more to both shrink their CI as well as to increase their average.

Finally, it might make sense to weight more recent games more so that an isolated bad game decays in significance more quickly. The way to do that would be to pick a discount factor d greater than 1 and give the ith game a weight of d^(i-1). (Although then I'm no longer sure how to apply the confidence interval idea.)

PS: I expanded on this idea here: http://stackoverflow.com/questions/895009/how-to-calculate-mean-based/895255#895255

dreeves
That sounds very useful. I'd probably replace "average score" part of my proposal with something like that. I'd still keep the "number of games" term, though.I wouldn't mind seeing the forumula you had in mind too.
Phil Nash
Thanks for posting the forumula. I think that would improve things by weighting the average further in favour of repeated successes.
Phil Nash
To piggy back on the "old games decay"... you're essentially describing a weighted moving average, and indeed classical CI's break down there. You could constructed a bayesian credible interval though, but it'd be a bit messy. Doing bayesian can also get rid of the "one game" problem.
Gregg Lind
A: 

Construct a graph, with each person represented by a vertex. Each edge in the graph represents a series of matches between two players. Now apply some type of page rank algorithm to give you a set of weights over the vertices. That should give you your ranking.

Now the tricky part is picking the edge weights used in pagerank. For the directed edge (u,v) -- from vertex u to vertex v -- I would personally assign a weight equal to the number of points player u has won against player v.

You can add vertices to your graph whenever, but remember that page rank favours older vertices (ie those that have played more games!). Anyway for a reference:

http://dbpubs.stanford.edu:8090/pub/1999-66

An alternative idea is to use ELO ratings, and to try to bootstrap it by assigning everyone the same score to start with, and then propagate a score forwards. I can't say this is entirely satisfactory though.

Ying Xiao
+2  A: 

Here's a principled approach:

http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

dreeves