views:

64

answers:

2

This question is more logic than programming at the moment.. once I understand what algorithm(s) I need to use I'll be looking into how to implement it.

I've got a list of items in a database that need to be voted up or down by users to determine if they are correct or not. The aim is to provide a % for each item to show how reliable the item information is.

There's a few criteria to take into account..

  • Votes are not absolute - each user’s vote weight depends on their karma.
  • User karma should be calculated based on their votes - for example, if the user submits an item and other users vote to confirm that it is correct, that user's karma would increase. Karma could also be given if a user votes for an item in the same direction that other users with high karma have voted. If they vote in the opposite direction to other users with high karma, their vote would be considered incorrect and although it'd lower the score of the item it'd also lower their karma level, making them less influential in future voting.
  • Users can cast negative votes as well as positive votes.
  • Calculated scores of items should take into account the age of the item (over time the score would decrease as the item could become less reliable).

Does anyone have any recommendations on the best algorithm(s) for doing this, or any tips on how to implement this in a programming language (such as PHP)?

+1  A: 

I assume for your calculations you only consider the karma of the item you only consider the karma earlier voters had at the time of their vote and not their current karma (which may have changed since them) as this would result in a recursive function which probably would involve all items and all users.
Another assumption is, that the karma is indeed absolute but is recalculated when a new vote is done as votes are less frequent then views.
I'd store all votes of all users, the karma they had at the time of the vote and the voting direction for each item.
The final assumption: you add karma to the submiter not right after a vote but after a certain timespan. If you add it right away the submiters karma would go up/down quite often and result in a heavy jitter in your system. If you get a new vote I'd first calculate the new karma of the item and then add karma to the user depending on the absolute karma-change of the item:

The karma of an item is the sum of karma of all voting users: For example you have three votes: one up with 50 karma, one up with 150 karma, one down with 30 karma. This would result in a total karma of 170. So the item has an karma of +170.
Once a new user votes, you recalculate the karma of the item with the new vote taking into account: (previous example) new users votes up with 10 karma. New karma of the item is +180. The difference of the item's old and new karma is the karma the user gets: (previous example) the user's vote changed the item's karma by +10, so the user gets +10 karma (for future votes). The downside of this idea is, that high-karma users gain new karma very very fast, so you probably should add some limiting faktors here too (like logarithm) to scale it properly. As you want to consider the age of the item too, you can multiply the gained karma-points by a factor depending of the age (for example, if the item is older then 5 days, the user doesn't get any karma at all: 5 days - timespan to vote multiplied with the changed karma-value).

This is of course a very vague draft of the system you want to implement and I don't know if it fits your idea. It probably can be modified to add other factors as well:
You can determine the % relevance with: (absolute postitive karma/absolute negative karma): values less then 1 have more negative carma then postive karma and the other way round. But for a reliable % value you need some value to compare too in my oppinion (be it constant or calculated otherwise).

Fge
+3  A: 

Hi Rich,

Read this first: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

It's an introduction to a mathematical concept known as Wilson score confidence intervals for Bernoulli parameters.

That article is a great primer on how to use your user's votes to calculate a score that is actually useful and mathematically sound. Do this, and you're already ahead of Amazon.com

Then, I think you probably need to tweak that formula a bit. In that formula it uses p for the fraction of positive votes. You might need to update the formula for p, to reflect the karma of the user that cast that vote.

Finally, to take the age into account, you multiply the outcome of the formula with an age multiplier. For example, if you want the outcome to become less relevant by 1% for each day it ages, multiply it by 0.99^age_in_days.

In a nutshell, that is the path I would follow. Hope this helps.

Edward