+6  A: 

This might not be exactly what flickchart is doing, but you could use a variant of the ELO algorithm used in chess (and other sports), since these are essentially fights/games that they win/lose.

Basically, all movies start off with 0 wins/losses and every time they get a win they get a certain amount of points. You usually have an average around 20 (but any number will do) and winning against a movie with the same rating as yourself will give exactly that 20. Winning against a bad movie will maybe give around 10 points, while winning against a better movie might give you 30 points. The other way around, losing to a good movie you only lose 10 points, but if you lose to a bad movie, you lose 30 points.

The specifics of the algorithm is in the wikipedia link.

Christian P.
Thank you! But this can only be a little part of the algorithm. Imagine you have already rated 20 movies - several times each. On rank 5 there is "Titanic". Then you have the choice between "Titanic" and "Shrek". You haven't rated "Shrek" so far. But you prefer "Shrek" here. With your score system only, "Shrek would enter the ranking at rank 11 or so. But it must be before "Titanic", right?
I assumed you wanted the scores for all people, not just yourself. For your "personal" top movie list, you'd need more than this or you would have to score all the movies a lot more times each for it to be accurate :)
Christian P.
Ok, thanks. So it's clear now :)
+1 for providing the same answer I thought of when I saw the question. ELO is good for 1 v 1 fights ala chess.
Andrew
Ok. So ELO seems to be a good solution, maybe not the best. But it will definitely work with ELO. :) But: Item A wins lots of fights and goes to position one according to ELO. Item B has just a few wins but many losses but it beats item A. According to ELO item B should be at the bottom. But according to another model item B should be on the 1st place!?
Number of wins does not determine your placement, it's WHO you win against. Imagine being #100 on the world rankings in Tennis. If you beat #1 three times, you're not just moved a couple of places up, you will skyrocket. If you beat #99 three times, you will at most move 1-2 places up.
Christian P.
The weakness is that ELO is designed with the expectation that A should beat B sometimes even if B is better than A. We have no such expectations when we rank our preferences. (Although if the question were "What are you more in the mood for?" rather than "What do you prefer?" or "Which is better?" then maybe.) Flickchart, for everything else it does badly (see my post) is consistent on this point. When A beats B, A advances past B, every time.
David Berger
@marco: For your personal list, all this can do is provide a loose sorting from worst to best. And a potentially broken one, since an inattentive user might say he likes A better than B better than C better than A.
Brian
+1  A: 

Or you might want to use a variant of PageRank see prof. Wilf's cool description.

JanHudecek
Thank you, good idea. This could work. The importance of an item is then the sum of the importance values of the beaten items, right?
Won't work immediately: PageRank assumes an undirected, cyclic graph, these ratings give a bipartite graph. This means that the algorithm has nothing to base the value of a link from a voter on. Not difficult to come up with ways of doing this, but the algorithm is incomplete until you've done so.
Charles Stewart
Thank you for this piece of information. So you need to define one film as "good" to make the calculations for the rest possible?
As I understood a question you would get a directed graph (film a is better than b). It doesn't have to be bipartite - there will be triangles. But I think that the algorithm would be to make the matrix:"A whose (i, j) entry is 1 if web site j links to web site i, and is otherwise 0"and then"you would find its eigenvectors and look for the one in which all of the entries have the same sign. The largest entry would tell you the team of first rank, the next largest entry would belong to the team of second rank"
JanHudecek
@Charles Stewart PageRank does not assume an undirected graph. Links are directed edges: http://en.wikipedia.org/wiki/PageRank#Simplified_algorithm
charstar
The way I had assumed to represent votes is as a pair of edges, one positive from a voter to the favoured film, the other negative to the disfavoured film. Then the graph is bipartite, since all edges flow from voters to films. But if you made the negative vote an edge from the film to the voter, then I'm guessing PageRank would work.
Charles Stewart
@Charles Stewart I think you're getting a little off by including users as nodes in the graph. To make a model out of one user, you could do exactly what PageRank does and make an edge from winner to loser. For multiple users, you might be able to do the same thing, but with edge-weights based on averages across users. I'm not sure whether that will necessarily be solvable though, so it's just an idea.
David Berger
@David: Unless you wish to treat all votes equally (!), then you need to represent some of what you know about the voter in the graph. I think having voters as nodes, rather than, say labels on edges, will give a much simpler analysis.
Charles Stewart
A: 

http://en.wikipedia.org/wiki/Maximize_Affirmed_Majorities?

(Or the BestThing voting algorithm, originally called the VeryBlindDate voting algorithm)

Stobor
Thank you. On the page for the "BestThing algorithm" there's a description for what the algorithm does: Two items are shown and you choose the one you like best. This is exactly what I'm looking for. But is it really that easy? Simply dividing the number of wins by the total number of fights?
+4  A: 

How are the ratings calculated? How do you decide which film is on the first place in the ranking? You have to consider how often an items wins and how good are the beaten items.

What you want is a weighted rating, also called a Bayesian estimate.

I think IMDB's Top 250 movies is a better starting point to make a ranking website. Some movies have 300,000+ votes while others others have fewer than 50,000. IMDB uses a Bayesian estimate to rank movies against one another without unfairly weighting popular movies. The algorithm is given at the bottom of the page:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C where:

  • R = average for the movie (mean) = (Rating)
  • v = number of votes for the movie = (votes)
  • m = minimum votes required to be listed in the Top 250 (currently 3000)
  • C = the mean vote across the whole report (currently 6.9)

for the Top 250, only votes from regular voters are considered.

I don't know how IMDB chose 3000 as their minimum vote. They could have chosen 1000 or 10000, and the list would have been more or less the same. Maybe they're using "average number of votes after 6 weeks in the box office" or maybe they're using trial and error.

In any case, it doesn't really matter. The formula above is pretty much the standard for normalizing votes on ranking websites, and I'm almost certain Flickrchart uses something similar in the background.

The formula works so well because it "pulls" ratings toward the mean, so ratings above the mean are slightly decreased, ratings below the mean are slightly increased. However, the strength of the pull is inversely proportional to the number of votes a movie has. So movies with few votes are pulled more aggressively toward the mean than movies with lots of votes. Here are two data points to demonstrate the property:

Rank  Movie            Votes            Avg Rating        Weighted Rating
----  -----            -----            ----------        ---------------
219   La Strada        15,000+          8.2               8.0
221   Pirates of the   210,000+         8.0               8.0
      Caribbean 2

Both movies' ratings are pulled down, but the pull on La Strada is more dramatic since it has fewer votes and therefore is not as representative as ratings for PotC.


For your specific case, you have two items in a "fight". You should probably design your table as follows:

Items
-----
ItemID (pk)
FightsWon (int)
FightsEngaged (int)

The average rating is FightsWon / FightsEngaged. The weighted rating is calculated using the formula above.

When a user chooses a winner in a fight, increase the winning item's FightsWon field by 1, increase both items FightsEngaged field by 1.

Hope this helps! - Juliet

Juliet
Thank you very much! This is a very interesting algorithm for normal votings. But in this case, you should consider the opponent in the "fight", shouldn't you? Winning against "La Strada" is easier than winning against "The Dark Knight", isn't it?
Remember, IMDB users self-select movies they vote on, so theres not an equally likely chance of ranking La Strada against the Dark Knight. A "fight" website would presumably make fights between two items equally likely, not like the self-selected votes for IMDB movies -- however, you'll probably see a significant pull toward the mean for newly added items, simply because they haven't had much time to engage in fights. Highly rated new items *should* correct themselves over time, usually after they earn 5x the minimum vote for the top 10.
Juliet
Downvoters, please explain!
Juliet
Thank you for the additional explanation, Juliet. You're algorithm from IMBB is great for normal votings where I can self-select the items to vote on. But for my purpose, 1vs1 fights, it isn't the best one I think.
@Juliet I didn't down vote you but I'd imagine it is because using a naïve bayesian estimator makes the _assumption of independence_. While what you have posted corrects for _grade inflation_ it does not take into account changes in rank of other items over time (dependence). Consider items A, B, C, and D. The user chooses B over A, then D over C, then later C over B. B and C both now have two engagements and one win however what should have been inferred is D > C > B > A. The model must be expressible as an acyclic weighted digraph.
charstar
@cda: Let's say we have A, B, C, and D: B wins over A 70% of the time, D wins over C 100% of the time, D wins over B 100% of the time, and C wins over A 70% of the time. Can you rightfully conclude D > B > C > A? No, not without information on the number of fights. If A, B, and C have been engaged in 1000s of fights, and D has only been engaged in a single fight against D and B (hence a perfect record), then you can't conclude anything about the relative strength of D. Would a graph representation really differ from a bayesian estimator after 10s of 1000s of nodes are added to the graph?
Juliet
@Juliet Using bayesian methods would only work for determining the _global_ ranking in this case. The OP is talking about an explicit chain of preferences by the user (not a fitness/strength/performance metric). Consider this: user votes on A through Y and it just so happens that he likes them in alphabetical order except Z which he likes better than A. It's only one engagement and one win but the user expects Z to no be ranked on top ("of all the movies I've voted on, this is even better than the current top one").
charstar
Thank you charstar, this is exactly what is missing in this answer. But for other purposes, the answer is great. There are many cases for this code. Thank you very much, Juliet!
+1  A: 

As for flickchart, I've been playing around with it a little bit, and I think the rating system is pretty unsophisticated. In pseudo-code, my guess is that it looks something like this:

if rank(loser) == null and rank(winner) == null
    insert loser at position estimated from global rank
    insert winner at position estimated from global rank
else if rank(winner) == null or rank(winner) < rank(loser)
    then advance winner to loser's position and demote loser and all following by 1

Why do I think this? First, I'm completely convinced that their Bayesian priors are not based on a careful mining of my previous choices. They seem to have no way to guess that because I like Return of the Jedi that I like The Empire Strikes Back. In fact, they can't figure out that because I've seen Home Alone 2 that I may have seen Home Alone 1. After hundreds of ratings, the choice hasn't come up.

Second of all, if you look at the above code you might find a little bug, which you will definitely notice on the site. You may notice that sometimes you will make a choice and the winner will slide by one. This seems to only happen when the loser wasn't previously added. My guess is that what is happening is that the loser is being added higher than the winner.

Other than that, you will notice that rankings do not change at all unless a lower ranked movie beats a higher ranked movie directly. I don't think any real scores are being kept: the site seems to be entirely memoryless except for the ordinal rank of each movie and your most recent rating.

David Berger
You have forgotten some closing brackets in your pseudo-code, haven't you? And in both statements there is rank(winner)==null so this doesn't make sense really. Do you think it is that easy???
I'm not using any opening brackets...so why would there be closing brackets? Why on earth would I use brackets for pseudo-code? There are parentheses problems and I'll fix those. I don't claim this is a good algorithm, I claim it's how flick-chart works. Experiment with it and offer a counter-example. If it were more complex, don't you think that when you rank A versus B, there ought to be the *possibility* that C changes (more than just being down-shifted by one)?
David Berger
Sorry, I wanted to say parentheses. This is what I meant. ;) I'll test your code. Thank you very much!
Your algorithm seems to be correct. But I would replace "demote loser by 1" by "demote all items following by 1". And: If the winner is in your own ranking but the loser not yet, the winner advances one position.
Another thing which would have been interesting: How to calculate the global ranking? This is very important since your algorithm uses it.
@marco92w "If the winner is in your own ranking but the loser not yet, the winner advances one position." Surprisingly, no! I've seen the opposite happen! The loser just gets inserted (I *think* based on global but it's really just a guess), *frequently at a higher position than the winner.* This was what first tipped me off that the algorithm was not so sophisticated.
David Berger
A: 

I believe this kind of 1 vs. 1 scenario might be a type of conjoint analysis called Discrete Choice. I see these fairly often in web surveys for market research. The customer is generally asked to choose between two+ different sets of features that they would prefer the most. Unfortunately it is fairly complicated (for a non-statistics guy like myself) so you may have difficulty understanding it.

Joe Philllips
Thank you, d03boy. I think this isn't really the algorithm I'm looking for. It is used in economics but it isn't useful for evaluating 1vs1 fight results, is it?
+1  A: 

I've been toying with the problem of ranking items by means of pair-wise comparison for some time myself, and wanted to take the time to describe the ideas I came up with so far.

For now I'm simply sorting by <fights won> / <total fights>, highest first. This works fine if you're the only one voting, or if there are a lot of people voting. Otherwise it can quickly become inaccurate.

One problem here is how to choose which two items should fight. One thing that does seem to work well (subjectively) is to let the item that has the least fights so far, fight against a random item. This leads to a relatively uniform number of fights for the items (-> accuracy), at the cost of possibly being boring for the voter(s). They will often be comparing the newest item against something else, which is kinda boring. To alleviate that, you can choose the n items with the lowest fight-count and chose one of those randomly as the first contender.

You mentioned that you want to make victories against strong opponents count more than against weak ones. As mentioned in other posts above, rating systems used for chess and the like (Elo, Glicko) may work. Personally I would love to use Microsoft's TrueSkill, as it seems to be the most accurate and also provides a good way to pick two items to pit against each other -- the ones with the highest draw-probability as calculated by TrueSkill. But alas, my math understanding is not good enough to really understand and implement the details of the system, and it may be subject to licensing fees anyway...

Collective Choice: Competitive Ranking Systems has a nice overview of a few different rating systems if you need more information/inspiration.

Other than rating systems, you could also try various simple ladder systems. One example:

  1. Randomize the list of items, so they are ranked 1 to n
  2. Pick two items at random and let them fight
  3. If the winner is ranked above the loser: Do nothing
  4. If the loser is ranked above the winner:
    • If the loser is directly above the winner: Swap them
    • Else: Move the winner up the ladder x% toward the loser of the fight.
  5. Goto 2

This is relatively unstable in the beginning, but should improve over time. It never ceases to fluctuate though.

Hope I could help at least a little.

DataWraith
Thank you, you helped a lot. :) These are interesting concerns (which items to choose etc).
Your fightswon/totalfights, where totalfights is how many fights the specific film has been in, is actually the best indicator of which film is the best, and gives very stable results. (See my answer for explanation). What exactly do you think is wrong with it?
KernelJ
Oh I see your point, my system assumes that each person can only vote once on some random thing. It trivially works for a single user voting over and over also. If you have a few users, you need to add another level of complexity to the system graph so that each user is weighted the same, and this may be inaccurate, if some many people haven't done very many votes. The answer to this is sumoverallusers(fightswon/totalfights), i.e. a voting history is recorded for all users, and you just add all the scores. If totalfights is 0, you could assume fightswon/totalfights is a default value 0.5.
KernelJ
A: 

I heartily recommend the book Programming Collective Intelligence for all sorts of interesting algorithms and data analysis along these lines.

DanDan
+1  A: 

After having thought things through, the best solution for this film ranking is as follows.

Required data:

  • The number of votes taken on each pairing of films.
    • And also a sorted version of this data grouped like in radix sort
  • How many times each film was voted for in each pairing of films

Optional data:

  • How many times each film has been involved in a vote for each user

How to select a vote for a user:

  • Pick out a vote selection from the sorted list in the lowest used radix group (randomly)
  • Optional: use the user's personal voting stats to filter out films they've been asked to vote on too many times, possibly moving onto higher radix buckets if there's nothing suitable.

How to calculate the ranking score for a film:

  • Start the score at 0
  • Go through each other film in the system
    • Add voteswon / votestaken versus this film to the score
      • If no votes have been taken between these two films, add 0.5 instead (This is of course assuming you want new films to start out as average in the rankings)

Note: The optional stuff is just there to stop the user getting bored, but may be useful for other statistics also, especially if you include how many times they voted for that film over another.

Making sure that newly added films have statistics colleted on them ASAP and very evenly distributed votes across all existing films is vital to keeping stats correct for the rest of the films. It may be worth staggering the entry of a bunch of new films to the system to avoid temporary glitches in the rankings (though not immediate nor severe).

===THIS IS THE ORIGINAL ANSWER===

The problem is actually very easy. I am assuming here that you want to order by preference to vote for the film i.e. the #1 ranked film is the film that is most likely to be chosen in the vote. If you make it so that in each vote, you choose two films completely at random you can calculate this with simple maths.

Firstly each selection of two films to vote on is equally likely, so results from each vote can just be added together for a score (saves multiplying by 1/nC2 on everything). And obviously the probability of someone voting for one specific film against another specific film is just votesforthisfilm / numberofvotes.

So to calculate the score for one film, you just sum votesforthisfilm / numberofvotes for every film it can be matched against.

There is a little trouble here if you add a new film which hasn't had a considerable number of votes against all the other films, so you probably want to leave it out of the rankings until a number of votes has built up.

===WHAT FOLLOWS IS MOSTLY WRONG AND IS MAINLY HERE FOR HISTORICAL CONTEXT===

This scoring method is derived from a Markov chain of your voting system, assuming that all possible vote questions were equally likely. [This first sentence is wrong because making all vote questions have to be equally likely in the Markov chain to get meaningful results] Of course, this is not the case, and actually you can fix this as well, since you know how likely each vote question was, it's just the number of votes that have been done on that question! [The probability of getting a particular vote question is actually irrelevant so this doesn't help] In this way, using the same graph but with the edges weighted by votes done...

Probability of getting each film given that it was included in the vote is the same as probability of getting each film and it being in the vote divided by the probability it was included in the vote. This comes to sumoverallvotes((votesforthisfilm / numberofvotes) * numberofvotes) / totalnumberofvotes divided by sumoverallvotes(numberofvotes) / totalnumberofvotes. With much cancelling this comes to votesforthisfilmoverallvotes / numberofvotesinvolvingthisfilm. Which is really simple!

KernelJ
Thank you very much! Does it make sense to divide by numberofvotes and then multiply with numberofvotes? Your approach doesn't consider which items are beaten. So if itemA beats weak items in 80% and itemB beats strong items in 70% which one is better? It should be itemB ...
This might actually be a good way to calculate a global index...but it's not very helpful in determining an individual's preference. I currently have around 300 movies in flickchart and have voted 1000 times. But there are 300*299/2 possible matchups. Each movie has been involved in around 6 votes. How am I going to rank 300 movies when all vote totals are either 1, 2, 3, 4, 5, or 6? Not to mention that most movies haven't been paired and those that have, have only been paired once.
David Berger
@David: I agree completely, that was never my intention. This approach converges to the correct answer as the number of votes increases. For a single person they would not really bother to vote enough times to get meaningful results. However, the amount of data to be stored for each user is quite small, so you should get quite an accurate global ranking.It would be interesting to see if a Markov chain approach could be used to guess a personal ranking by stereotyping people, or any approach with only the per film totals. Indeed this would be brilliant for suggesting movies you should see!
KernelJ
@marco92w: The * numberofvotes came from the factor numberofvotes / totalnumberofvotes, which is the probability this particular vote selection was taken, as discussed in the previous paragraph. I pulled totalnumberofvotes outside because it doesn't depend on the actual vote selection. So yes it does make sense.My latter solution depends on the server trying to maintain that the vote selection is effectively random, (guaranteed in the first solution), by trying to get the numberofvotes on all vote selections to be equal. This means new films added will be compared often to a random film.
KernelJ
Actually having rethought the whole problem I made a massive blunder in my answer of forgetting that the probability of getting to any said selection of 2 films in a vote is meant to be uniformly distributed for the Markov chain to make sense. This is why marco you were completely right, the second solution doesn't make any sense at all!Also, there's absolutely no reason why the selection of two votes has to be random in the first solution. And the way to solve the problem in the first solution is the same as the problem in the second faulty one. Compare new films against all others fast!
KernelJ
David Berger
Cool, I'll look into that. I wonder how they measure the 'accuracy'... I was thinking of using a genetic algorithm of some sort to generate organisms with preferences. You could test it on something like how much they like a string.
KernelJ