views:

150

answers:

2

Ive been working on a feature of my application to implement a leaderboard - basically stack rank users according to their score. Im currently tracking the score on an individual basis. My thought is that this leaderboard should be relative instead of absolute i.e. instead of having the top 10 highest scoring users across the site, its a top 10 among a user's friend network. This seems better because everyone has a chance to be #1 in their network and there is a form of friendly competition for those that are interested in this sort of thing. Im already storing the score for each user so the challenge is how to compute the rank of that score in real time in an efficient way. Im using Google App Engine so there are some benefits and limitations (e.g., IN [array]) queries perform a sub-query for every element of the array and also are limited to 30 elements per statement

For example

1st Jack 100

2nd John 50

Here are the approaches I came up with but they all seem to be inefficient and I thought that this community could come up with something more elegant. My sense is that any solution will likely be done with a cron and that I will store a daily rank and list order to optimize read operations but it would be cool if there is something more lightweight and real time

  1. Pull the list of all users of the site ordered by score. For each user pick their friends out of that list and create new rankings. Store the rank and list order. Update daily. Cons - If I get a lot of users this will take forever

2a. For each user pick their friends and for each friend pick score. Sort that list. Store the rank and list order. Update daily. Record the last position of each user so that the pre-existing list can be used for re-ordering for the next update in order to make it more efficient (may save sorting time)

2b. Same as above except only compute the rank and list order for people who's profiles have been viewed in the last day Cons - rank is only up to date for the 2nd person that views the profile

+2  A: 

If writes are very rare compared to reads (a key assumption in most key-value stores, and not just in those;-), then you might prefer to take a time hit when you need to update scores (a write) rather than to get the relative leaderboards (a read). Specifically, when a user's score change, queue up tasks for each of their friends to update their "relative leaderboards" and keep those leaderboards as list attributes (which do keep order!-) suitably sorted (yep, the latter's a denormalization -- it's often necessary to denormalize, i.e., duplicate information appropriately, to exploit key-value stores at their best!-).

Of course you'll also update the relative leaderboards when a friendship (user to user connection) disappears or appears, but those should (I imagine) be even rarer than score updates;-).

If writes are pretty frequent, since you don't need perfectly precise up-to-the-second info (i.e., it's not financials/accounting stuff;-), you still have many viable approaches to try.

E.g., big score changes (rarer) might trigger the relative-leaderboards recomputes, while smaller ones (more frequent) get stashed away and only applied once in a while "when you get around to it". It's hard to be more specific without ballpark numbers about frequency of updates of various magnitude, typical network-friendship cluster sizes, etc, etc. I know, like everybody else, you want a perfect approach that applies no matter how different the sizes and frequencies in question... but, you just won't find one!-)

Alex Martelli
Thanks, this is helpful. Writes are very frequent as users accumulate points just for using the application on a day to day. I hadnt thought of the approach of having some kind of 'delta check' to decide when to update the relative boards (e.g., queue up an update when someone jumps by score of 10 or more) that will ensure that recomputes only happen for very active users. I can then increase this constant if it gets hit more and more frequently in order to minimize the re-computation exercise. Let me know if you have any other ideas
Aneto
Some rough metrics. Each user will generate an average of 10 points which will be distributed across 10 friends - 1 point given to each friend. So if a typical user has 10 friends, that user will accumulate 100 points per day - I could potentially use 50 points as the threshold for updating across the network
Aneto
One idea might be: for a newbie user (say < 200 / 300 points or so) every update might be important, so you could set the recomputation threshold lower for those; and progressively higher for users that already have many points (gaining 90 when you already have 2350 is a minor thing, while gaining 90 when you start with 70 is huge;-).Whether that helps depends on "short head vs long tail" issues, i.e., does a lot of the scoring happen for very active users and not so much for the "long tail" of less-active ones?
Alex Martelli
A: 

There is a python library available for storing rankings:

http://code.google.com/p/google-app-engine-ranklist/