views:

852

answers:

2

On a PHP & CodeIgniter-based web site, users can earn reputation for various actions, not unlike Stack Overflow. Every time reputation is awarded, a new entry is created in a MySQL table with the user_id, action being rewarded, and value of that bunch of points (e.g. 10 reputation). At the same time, a field in a users table, reputation_total, is updated.

Since all this is sort of meaningless without a frame of reference, I want to show users their percentile rank among all users. For total reputation, that seems easy enough. Let's say my user_id is 1138. Just count the number of users in the users table with a reputation_total less than mine, count the total number of users, and divide to find the percentage of users with a lower reputation than mine. That'll be user 1138's percentile rank, right? Easy!

But I'm also displaying reputation totals over different time spans--e.g., earned in the past seven days, which involves querying the reputation table and summing all my points earned since a given date. I'd also like to show percentile rank for the different time spans--e.g., I may be 11th percentile overall, but 50th percentile this month and 97th percentile today.

It seems I would have to go through and find the reputation totals of all users for the given time span, and then see where I fall within that group, no? Is that not awfully cumbersome? What's the best way to do this?

Many thanks.

+2  A: 

I don't see why that would be too overly complex. Generally all you would need is to add to your WHERE clause a query that limits results like:

WHERE DatePosted between @StartOfRange and @EndOfRange
TheTXI
Not complex, no--but I'd have to do that for every single user, wouldn't I? And while it wouldn't be complex, wouldn't that be... well... slow?
Toph
What do you mean for every single user? I can't really comment too much seeing as how we have no idea what your actual queries look like.
TheTXI
A: 

I can think of a few options off the top of my head here:

  1. As you mentioned, total up the reputation points earned during the time range and calculate the percentile ranks based on that.

  2. Track updates to reputation_total on a daily basis - so you have a table with user_id, date, reputation_total.

  3. Add some new columns to the user table (reputation_total, reputation_total_today, reputation_total_last30days, etc) for each time range. You could also normalize this into a separate table (reputation_totals) to prevent you from having to add a new column for each time span you want to track.

Option #1 is the easiest, but it's probably going to get slow if you have lots of rows in your reputation transaction table - it won't scale very well, especially if you need to calculate these in real time.

Option #2 is going to require more storage over time (one row per user per day) but would probably be significantly faster than querying the transaction table directly.

Option #3 is less flexible, but would likely be the fastest option.

Both options 2 & 3 would likely require a batch process to calculate the totals on a daily basis, so that's something to consider as well.

I don't think any option is necessarily the best - they all involve different tradeoffs of speed/storage space/complexity/flexibility. What you do will ultimately depend on the requirements for your application of course.

Eric Petroelje