views:

211

answers:

3

I'm looking for a good algorithm recommendation.

I have Users and Achievements. Users create Achievements and then give them to other Users. Associated with each Achievement is the point value that the user specifies. A User's total points is the sum of all their achievements.

Basically:

Achievement :
    owner = Alias
    points = int

User :
    achievements = list(Achievement)
    def points() :
        sum([achievements.points])

Ok, so this system is obviously very game-able. You can make many accounts and give tons of achievements to eachother. I'm try to reduce that a little bit by scaling the point values to something different than what the user specified.

  1. Assuming all users are honest, but they just gauge difficultly differently. How should I normalize the point values? AKA one user gives 5 points for every easy achievement, and another gives 10 points, how can I normalize them to one value. The goal would be a distribution where points are proportional to difficulty.
  2. If one user isn't very good at judging point values, how can I figure out difficulty based on the number of users that have gotten the achievement?
  3. Assume that Users could be mostly partitioned into disjoint groups with one User giving achievements to a whole set of other ones. Does that help the previous two algorithms? For example, User A only gives achievements to Users that end with an odd number and User B only gives achievements to Users that end with an even.
  4. If everyone is malicious, how close can I get to not having user's being able to hyper-inflate their point values?

Note: The quality of a giving users is not in any way related to how many achievements he has received. Many givers are just bots that haven't received anything themselves but automatically reward users for doing certain actions.

My current plan is something like this. I have an allocation of 10 points / person that has got an achievement from me. If I have given out 10 achievements to 55 people total, my allocation is 550. Then this is given to each achievement based on the number of people who got it. If the distribution was [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] people who got each achievement, then the point values would be [50, 25, 16.6, 12.5, 10, 8.3, 7.1, 6.25, 5.5, 5].

Any problems with my approach and alternative recommendations are welcome and appreciated. Also, post other cases that you can think of that I've missed, and I'll add them to the list. Thanks!

A: 

I think that in your system, as in stackoverflow, digg, slashdot, etc. your basic goals are to

  1. Indentify honest users
  2. Promote their actions

Generally we identify honest users by their actions: those accounts that have existed for a long time on the site and have been vetted by other users, and by you. Stack overflow uses the reputation score for this, slashdot uses karma points.

Once you identify these honest users then you can have their votes count in proportion to the reputation score: the more honest a user seems to be the more we trust his achievements.

Thus, you might give new accounts an initial score of 10. That user can then give any number of achievements he wants but their actual total value will be 10 (like the proportional allocation you suggest). That is, if a new user gives 100 achievements (all worth the same number of points) then each one will be worth .1 points because his score is 10. Then, as that user gets achievements from other users his score increases.

Basically, I'm suggesting you use pagerank, but instead of ranking web pages you are ranking users and instead of hyperlinks the links are achievements given by that user to others.

That's one way to solve this problem. There are many others. It depends on your specific needs. Auctions are always fun. You can have everyone bid on an achievement before it is actually achieved in order to establish the price (score) that the community places on that achievement. You will need to limit the amount of 'money' people have.

Jose M Vidal
Thank you for the response. I would love to have done the page-rank approach but many of my users are just 'bots' that don't actually get any achievements themselves. People build these bots but will not always agree on what '10 points' means for easyness. I'd rather avoid explicit bidding on achievements and let the implicit characteristics make the value since I don't know how much I can trust the community just yet. In light of this, is my algorithm the 'best' way to go?
Paul Tarjan
I am confused by your description. Namely, when you say "I have an allocation of 10 points / person that has got an achievement from me." is the "me" referring to you, the owner of the website, or is it referring to anyone with an account on the website?
Jose M Vidal
Sorry, *me* was the person giving the achievement. Any person on the website can give them. Rephrasing in post.
Paul Tarjan
A: 

I've been struggling with this type of problem on my own site. If you have a large corpus of existing data you can use as a baseline, score normalization seems pretty effective. First get the mean value and standard deviation for the user's created achievements:

SELECT AVG(Points) AS user_average, 
STDDEV_POP(Points) AS user_stddev
FROM Achievements WHERE Owner = X

Use these values to calculate a context-free "z-score":

$zscore = ($rating - $user_average) / $user_stddev;

Get the mean and standard deviation for all achievments:

SELECT AVG(Points) AS all_average, 
STDDEV_POP(Points) AS all_stddev 
FROM Achievements

Use these values to create a normalized "t-score":

$tscore = $all_average + ($all_stddev * $zscore);

Then use the t-score as your internal representation of an achievement's value. YMMV. :)

Interesting... Your input is $rating and out output is $tscore right? What I'm wondering is why do you trust the rating of the user at all? It seems the corpus has more information and doesn't lie.
Paul Tarjan
Due to the limit on comment length, please see my response as a separate answer. I'm new here, maybe I'm using the site wrong?
A: 

Correct, $rating is input and $tscore is the normalized output.

Ideally, everyone would assign points for their achievements on an identical scale. One point for stupid or trivial achievements, ten points for modest achievements, 50 points for truly epic achievements, or whatever. But people have very different behavior when it comes to assigning scores. Some will be very generous, and make every achievement worth the max. Others will be strict and accurate, adhering carefully to the scale as it relates to the difficulty of the achievement. Others may think it's dumb that people worry about points, and assign the minimum value for all the achievements they create.

Normalization attempts to handle these individual abnormalities and fit everyone's ratings to the same scale. It's like what they do with the judges' scores in the Olympics. You don't "blindly trust" the value a user assigned to an achievement, but it's something you want to account for if it's part of the system. Otherwise you could presumably just hard-code the point value of achievements, limit how often they can be created, and it sounds like that would curb the worst abuse. But the score is useful because, after normalization, you can figure out what the achievement's value would be worth if it was created by a stereotypically average user. That makes it difficult for people to "game" the system because the further they get from the average value and distribution for achievements, the more their own values get normalized back towards the baseline.

I should mention that I am not a professionally trained programmer, and I have never taken a statistics class or any higher math. Due to my own limitations of understanding, perhaps I'm not the best person to be explaining this. But I have been struggling with a similar problem on my own site (user-to-user ratings), and after trying numerous approaches this one seems like the most promising. Most of the inspiration for the implementation came from http://www.ericdigests.org/2003-4/score-normilization.html so you might like to read that as well.