views:

307

answers:

2

Lets say I have around 1,000,000 users. I want to find out what position any given user is in, and which users are around him. A user can get a new achievement at any time, and if he could see his standing update, that would be wonderful.

Honestly, every way I think of doing this would be horrendously expensive in time and/or memory. Ideas? My closest idea so far is to order the users offline and build percentile buckets, but that can't show a user his exact position.

Some code if that helps you django people :

class Alias(models.Model) :
    awards = models.ManyToManyField('Award', through='Achiever')

    @property
    def points(self) :
        p = cache.get('alias_points_' + str(self.id))
        if p is not None : return p

        points = 0
        for a in self.achiever_set.all() :
            points += a.award.points * a.count

        cache.set('alias_points_' + str(self.id), points, 60 * 60) # 1 hour
        return points

class Award(MyBaseModel):
    owner_points = models.IntegerField(help_text="A non-normalized point value. Very subjective but try to be consistent. Should be proporional. 2x points = 2x effort (or skill)")
    true_points = models.FloatField(help_text="The true value of this award. Recalculated with a cron job. Based on number of people who won it", editable=False, null=True)

    @property
    def points(self) :
        if self.true_points :
            # blend true_points into real points over 30 days
            age = datetime.now() - self.created
            blend_days = 30
            if age > timedelta(days=blend_days) :
                age = timedelta(days=blend_days)
            num_days = 1.0 * age.days / blend_days
            r = self.true_points * num_days + self.owner_points * (1 - num_days)
            return int(r * 10) / 10.0

        else :
            return self.owner_points


class Achiever(MyBaseModel):
    award = models.ForeignKey(Award)
    alias = models.ForeignKey(Alias)
    count = models.IntegerField(default=1)
+3  A: 

I think Counterstrike solves this by requiring users to meet a minimum threshold to become ranked--you only need to accurately sort the top 10% or whatever.

If you want to sort everyone, consider that you don't need to sort them perfectly: sort them to 2 significant figures. With 1M users you could update the leaderboard for the top 100 users in real time, the next 1000 users to the nearest 10, then the masses to the nearest 1% or 10%. You won't jump from place 500,000 to place 99 in one round.

Its meaningless to get the 10 user context above and below place 500,000--the ordering of the masses will be incredibly jittery from round to round due to the exponential distribution.

Edit: Take a look at the SO leaderboard. Now go to page 500 out of 2500 (roughly 20th percentile). Is there any point to telling the people with rep '157' that the 10 people on either side of them also have rep '157'? You'll jump 20 places either way if your rep goes up or down a point. More extreme, is that right now the bottom 1056 pages (out of 2538), or the bottom 42% of users, are tied with rep 1. you get one more point, and you jumped up 1055 pages. Which is roughly a 37,000 increase in rank. It might be cool to tell them "you can beat 37k people if you get one more point!" but does it matter how many significant figures the 37k number has?

There's no value in knowing your peers on a ladder until you're already at the top, because anywhere but the top, there's an overwhelming number of them.

Dustin Getz
someone please edit this to be more articulate, im going to bed.
Dustin Getz
I was trying to give the users a goal by showing them people above them to beat, but not too far as to be unreachable.
Paul Tarjan
the jitter towards the bottom of the distribution will be so great that even going up or down 1 point will drop you or gain you several thousand places out of 1M. you should measure what your score distribution looks like.
Dustin Getz
A: 

One million is not so much, I would try it the easy way first. If the points property is the thing you are sorting on that needs to be a database column. Then you can just do a count of points greater than the person in question to get the rank. To get other people near a person in question you do a query of people with higher points and sort ascending limit it to the number of people you want.

The tricky thing will be calculating the points on save. You need to use the current time as a bonus multiplier. One point now needs to turn into a number that is less than 1 point 5 days from now. If your users frequently gain points you will need to create a queue to handle the load.

Jason Christa