views:

789

answers:

4

I've been looking into implementing a Stackoverflow-like site (for a completely different area of knowledge) for a little bit now, and I have a question on what people think is the best way to implement reputation for a system like this.

Of course, that's a broad topic, so here are some specific questions that I have:

  • Calculation of Reputation

While I do believe that every action the user takes is persisted in some manner (asking/answering a question, voting on a question/answer, being voted on) which would allow reconstruction of a reputation score from scratch, the more I look at the site, the more implausible it seems that is done every time a reputation score is needed.

To that end, I am of the belief that the user's reputation is only calculated once an interval (every day, two days, week, month, etc, etc) and then activity past that interval is added to the pre-calculated score.

If this is indeed the case, does one think that this would be an automated process that occurs once at a specified interval, or is it something that happens the next time the user tries to perform an action which would affect reputation?

My guess is that because other people can have an impact on your reputation, calculating it when you perform an action that affects reputation is a bad idea, unless that operation is performed every time anyone performs an action that affects your reputation.

Or perhaps I have all of this wrong? Since any one action on the site can only really affect one person's reputation at a time, perhaps the reputation is kept as a running tally and changed every time an action is performed?

After all, the only actions that can really affect a users reputation are upvoting, downvoting, and answer acceptance, it wouldn't seem too hard to actually keep a running total.

Thoughts?

  • Permissions Based on Reputation

Given that permissions on the site are reputation-based, and it is a fluid system, if a user wobbles back and forth over a permission-boundary, what happens? Do they gain and then lose the permission?

Also, what are the thoughts on the impact of the above questions in relation to this one?


Solution

Eventually I might go with my/Adam Davis' answer, but I will only use that if scalability is an issue. For now, I believe that updating a running total is the best way. I'll post another question though if I find it is not.


Implementation Details

The platform I am developing for is ASP.NET. Specifically, these are the technologies/components/services involved:

As for whether or not I will make it open source, it's separated pretty well so that it could easily be modified for any subject. There is some work to be done on this end though, as nothing is templatized, and the controllers are still in the website dll, instead of library dlls.

The controllers will definitely be refactored out into a referenced assembly, but I don't really have any plans to templatize the views. For me, it's just too specific to the site in question.

At this point, this is what I have:

  • Authentication

The login/logout routines are in place, along with automatic profile generation, but no editing on profiles yet.

  • Reputation

There is no reputation system in place yet.

  • Authorization

Since there is no reputation system in place, there is no authorization either.

  • Questions/Answers

The ability to post a new question is in the system is in place. This included spam analysis, with subsequent CAPTCHA validation if the heuristic says that it is spam. This mechanism is generalized, so it can be applied to all input that is going to be displayed on the site (questions, answers, comments).

Answers is going to be worked on next. Since all the input validation/spam detection is already in place, it's just a matter of linking up the input to the models and getting the validation correct.

The model for past revisions of questions and answers is in the system, but there is no interface for it yet.

After answers, I am going to work on upvoting/downvoting questions/answers. I'm going to have to give some people reputation just to give them the ability to upvote. If everyone starts at zero, then noone has reputation to do vote, nor will they gain any.

Then all the other little things will have to be put in of course, but one day at a time. =)

  • Update 7/28/2010

Just wanted to let everyone know that calculating the reputation as a running total is working just fine, with no scalability issues. Granted, I don't have the throughput that SO does, but it's not non-existent either.

For those that wish to know, the site is based on the video game "Street Fighter 4", and was developed so that I can easily deploy sites just like it for other game topics (including the upcoming Marvel vs. Capcom 3 and ultimately the recently announced Street Fighter X Tekken game).

The site is:

http://sf4answers.com

+4  A: 

Calculation of reputation is best done as you suggest, I believe. SO does something substantially similar - reputations have been recalculated for all users several times in the recent past, which suggests a discrete number that is updated probably daily. Given the reputation graph shows daily changes, they may simply have a rep-per-day table.

The permissions do indeed go away if one loses reputation. Otherwise someone could get just enough reputation to do bad things, and the offensive vote reputation penalty won't act until the next discrete reputation calculation.

Adam Davis
@Adam Davis: So you are leaning towards the once-per-interval over the running tally method, correct? If so, then the question is, is the recalculation on a fixed interval separate from the user's actions, or tied to another user's next action after a threshold has been breached?
casperOne
Once per interval, with the 'real time' score being added to that. How and when it's updated, though, depends on the design of the database and code.
Adam Davis
@Adam Davis: What are the costs that I am missing with using a running-tally approach then? Since every action that could influence reputation requires a DB operation, what's the harm in updating one more value?
casperOne
You'll have to run some tests yourself. Keep in mind that your "action table" is going to have hundreds of thousands of new actions a day, and running a sum along one affected user ID for the whole table is a lot more expensive than restricting the sum to a time period.
Adam Davis
But again, it depends on your DB backend. I doubt my mental model matches your database schema, so I can only guess based on what I know. If you are simply updating a single record per user each time the rep changes, then that would certainly be fast enough.
Adam Davis
@Adam Davis: Well, I'm thinking I would have a table for votes, with links to what is voted too. That's really all that's pertinent here. If I have to write the "vote" action every time, then when performing that action, why not just update the tally of total reputation when that table is hit?
casperOne
A: 

One thing that comes to mind is a ranking system that uses the logarithmic scale. Sites like CodeProject use it. For example, on CP, the article's rating is average vote times log_10 of the number of votes. This means that masses of people liking something only tilt the scale so much. I like it.

Dmitri Nesteruk
@Dmitri Nesteruk: And where is this ranking applied? Is it applied to the user, or is it applied to order search results, perhaps? I like the idea, but I need more information on where the ranking would be applied.
casperOne
@Dmitri Nesteruk: If it IS applied to the user, then one has to think about how reputation is calculated across the board, and that might be a task I'm not keen on taking up.
casperOne
+1  A: 

I am trying to also create a stackoverflow app (open source). I thought about the "once in a time update", but it isn't working. Except for rep gained by accepting one of your answers.
You will also notice that your rep changes immediately (by two) when you accept answers of others.
The reason why such stuff has to be real time is:
1. Prevent spammers (You need them to receive penalty immediately and not several minutes after they do what they intended to do.
2. If you upvote someone, you still have to save this action somewhere, so why not where it should be saved from the beginning?

Itay Moav
@Itay Moav: Curious, how far along are you in your implementation?
casperOne
Somewhere in the middle, check for yourself: phpancake.sourceforge.net/demoNot too much time (two kids and wife to feed :-D )
Itay Moav
@Itay Moav: You are definitely further along than I am. Do you have spam/captcha installed? Also, it's in PHP? If it was in .NET, I'd suggest combining forces!
casperOne
dot net, you might want to check ra-ajax, they have a dot net opensource version of SO.I am just now beginning to think on the most none intrusiveness ways to protect my site (see my latest question, it is general, can help you too).
Itay Moav
@Itay Moav: I've seen it, but I found it to be too different from the SO model in certain areas and I didn't want to hack apart someone else's code. Additionally, there doesn't seem to be anything regarding spam or human verification in there (I might have missed it) and that's a major concern.
casperOne
A: 

If you decide to do a daily reputation and vote cap, you should give people the option to set when there "day" starts. On SO, the start of the day can be in the middle of a user's day. As others have said, you should probably use a logarithmic scale for each post.

Zifre
@Zifre: I don't think that's a good idea, because then you are going to overload the system trying to run automated processes on the user's time. Also, it gives them input into the processing of the system which isn't really fair. Also, SO has indicated that start-of-day is 00:00 GMT
casperOne
It possibly lends itself to gaming problems as well.
Adam Davis
You could prevent gaming by only allowing users to change the setting for the next day, extending the current day until the next start of their day. And you could have only 4-6 automated processes, which should cover anyone's night.
Zifre