I am working on an algorithm to score individual players in a team-based game. The problem is that no fixed teams exist - every time 10 players want to play, they are divided into two (somewhat) even teams and play each other. For this reason, it makes no sense to score the teams, and instead we need to rely on individual player ratings.

There are a number of problems that I wish to take into account:

  • New players need some sort of provisional ranking to reach their "real" rating, before their rating counts the same as seasoned players.
  • The system needs to take into account that a team may consist of a mix of player skill levels - eg. one really good, one good, two mediocre, and one really poor. Therefore a simple "average" of player ratings probably won't suffice and it probably needs to be weighted in some way.
  • Ratings are adjusted after every game and as such the algorithm needs to be based on a per-game basis, not per "rating period". This might change if a good solution comes up (I am aware that Glicko uses a rating period).

Note that cheating is not an issue for this algorithm, since we have other measures of validating players.

I have looked at TrueSkill, Glicko and ELO (which is what we're currently using). I like the idea of TrueSkill/Glicko where you have a deviation that is used to determine how precise a rating is, but none of the algorithms take the random teams perspective into account and seem to be mostly based on 1v1 or FFA games.

It was suggested somewhere that you rate players as if each player from the winning team had beaten all the players on the losing team (25 "duels"), but I am unsure if that is the right approach, since it might wildly inflate the rating when a really poor player is on the winning team and gets a win vs. a very good player on the losing team.

Any and all suggestions are welcome!

EDIT: I am looking for an algorithm for established players + some way to rank newbies, not the two combined. Sorry for the confusion.

There is no AI and players only play each other. Games are determined by win/loss (there is no draw).

+2  A: 

Provisional ranking systems are always imperfect, but the better ones (such as Elo) are designed to adjust provisional ratings more quickly than for ratings of established players. This acknowledges that trying to establish an ability rating off of just a few games with other players will inherently be error-prone.

I think you should use the average rating of all players on the opposing team as the input for establishing the provisional rating of the novice player, but handle it as just one game, not as N games vs. N players. Each game is really just one data sample, and the Elo system handles accumulation of these games to improve the ranking estimate for an individual player over time before switching over to the normal ranking system.

For simplicity, I would also not distinguish between established and provisional ratings for members of the opposing team when calculating a new provision rating for some member of the other team (unless Elo requires this). All of these ratings have implied error, so there is no point in adding unnecessary complications of probably little value in improving ranking estimates.

Joel Hoff

how is the 'scoring' settled?,

if a team would score 25 points in total (scores of all players in the team) you could divide the players score by the total team score * 100 to get the percentage of how much that player did for the team (or all points with both teams).

You could calculate a score with this data, and if the percentage is lower than i.e 90% of the team members (or members of both teams): treat the player as a novice and calculate the score with a different weighing factor.

sometimes an easier concept works out better.

Edited it in, but games are simply won or lost, no scores. There are several reasons for this, but the main one is that scoring is not really an accurate reflection of the work the player has done for the team.
Christian P.
@Christian P.: it seems then that you need to think out something where the players get scored based on 'positive' and 'negative' actions, a scoring model based on no hard definition of what makes a player better or worse is as also implied below eventually error prone

The first question has a very 'gamey' solution. you can either create a newbie lobby for the first couple of games where the players can't see their score yet until they finish a certain amount of games that give you enough data for accurate rating.
Another option is a variation on the first but simpler-give them a single match vs AI that will be used to determine beginning score (look at quake live for an example).

+1  A: 

First off: It is very very unlikely that you will find a perfect system. Every system will have a flaw somewhere.

And to answer your question: Perhaps the ideas here will help: Lehman Rating on OkBridge.

This rating system is in use (since 1993!) on the internet bridge site called OKBridge. Bridge is a partnership game and is usually played with a team of 2 opposing another team of 2. The rating system was devised to rate the individual players and caters to the fact that many people play with different partners.

+1  A: 

There was an article in Game Developer Magazine a few years back by some guys from the TrueSkill team at Microsoft, explaining some of their reasoning behind the decisions there. It definitely mentioned teams games for Xbox Live, so it should be at least somewhat relevant. I don't have a direct link to the article, but you can order the back issue here:

One specific point that I remember from the article was scoring the team as a whole, instead of e.g. giving more points to the player that got the most kills. That was to encourage people to help the team win instead of just trying to maximize their own score.

I believe there was also some discussion on tweaking the parameters to try to accelerate convergence to an accurate evaluation of the player skill, which sounds like what you're interested in.

Hope that helps...


Without any background in this area, it seems to me a ranking systems is basically a statistical model. A good model will converge to a consistent ranking over time, and the goal would be to converge as quickly as possible. Several thoughts occur to me, several of which have been touched upon in other postings:

  1. Clearly, established players have a track record and new players don't. So the uncertainty is probably greater for new players, although for inconsistent players it could be very high. Also, this probably depends on whether the game primarily uses innate skills or acquired skills. I would think that you would want a "variance" parameter for each player. The variance could be made up of two parts: a true variance and a "temperature". The temperature is like in simulated annealing, where you have a temperature that cools over time. Presumably, the temperature would cool to zero after enough games have been played.
  2. Are there multiple aspects that come in to play? Like in soccer, you may have good shooters, good passers, guys who have good ball control, etc. Basically, these would be the degrees of freedom in you system (in my soccer analogy, they may or may not be truly independent). It seems like an accurate model would take these into account, of course you could have a black box model that implicitly handles these. However, I would expect understanding the number of degrees of freedom in you system would be helpful in choosing the black box.
  3. How do you divide teams? Your teaming algorithm implies a model of what makes equal teams. Maybe you could use this model to create a weighting for each player and/or an expected performance level. If there are different aspects of player skills, maybe you could give extra points for players whose performance in one aspect is significantly better than expected.
  4. Is the game truly win or lose, or could the score differential come in to play? Since you said no ties this probably doesn't apply, but at the very least a close score may imply a higher uncertainty in the outcome.
  5. If you're creating a model from scratch, I would design with the intent to change. At a minimum, I would expect there may be a number of parameters that would be tunable, and might even be auto tuning. For example, as you have more players and more games, the initial temperature and initial ratings values will be better known (assuming you are tracking the statistics). But I would certainly anticipate that the more games have been played the better the model you could build.

Just a bunch of random thoughts, but it sounds like a fun problem.

Chris Morlier

Every time 10 players want to play, they are divided into two (somewhat) even teams and play each other.

This is interesting, as it implies both that the average skill level on each team is equal (and thus unimportant) and that each team has an equal chance of winning. If you assume this constraint to hold true, a simple count of wins vs losses for each individual player should be as good a measure as any.