views:

35

answers:

1

My Schema

I have the following tables

table             notes/example values
------------------------------------------------
users (
  id
  email           # "[email protected]"
)                 

games (           
  id              
  name            # "Space Invaders", "Asteroids", "Centipede"
)                 

players (         
  id              
  name            # "uber dude"
  user_id         # player belongs to user
  game_id         # player belongs to game
)                 

scores (          
  id              
  player_id       # belongs to one player
  value           # 50
  created_at      # "2010-09-10",   "2010-08-05"
  month           # "2010-09",      "2010-08"
)

I need to create two reports.

1) Top players

Best performing players (sum all scores for each player) for the most recent 4 months. Show top 10 for each month.

    2010-07         2010-08           2010-09    2010-10
 1  plyA 5,000 pts  plyB  9,400 pts   ...        ...
    Centipede       Solitaire

 2  plyB 3,600 pts  plyC  8,200 pts   ...        ...
    Asteroids       Centipede       

 3  plyC 2,900 pts  plyA  7,000 pts   ...        ...
    Centipede       Centipede

 4  ...             ...               ...        ...
 5  ...             ...               ...        ...
 6  ...             ...               ...        ...
 7  ...             ...               ...        ...
 8  ...             ...               ...        ...
 9  ...             ...               ...        ...
10  ...             ...               ...        ...

2) Top Users:

Best performing users (sum all scores for each players for each user) for the most recent 4 months. Show top 10 for each month.

    2010-07           2010-08             2010-09    2010-10
 1  userA 50,000 pts  userB 51,400 pts    ...        ...
 2  userB 40,500 pts  userA 39,300 pts    ...        ...
 3  userC 40,200 pts  userC 37,000 pts    ...        ...
 4  ...               ...                 ...        ...
 5  ...               ...                 ...        ...
 6  ...               ...                 ...        ...
 7  ...               ...                 ...        ...
 8  ...               ...                 ...        ...
 9  ...               ...                 ...        ...
10  ...               ...                 ...        ...

MySQL View helper

For joining purposes, I have a stored view to help query the months for the reports. It will always return the most recent 4 months.

report_months (
  month
)

SELECT * FROM report_months;

2010-07
2010-08
2010-09
2010-10

The Problem

In report #1, for example, I can get the sums pretty easily.

select
  p.name        as player_name,
  g.name        as game_name,
  s.month       as month,
  sum(s.score)  as sum_score

from players  as p

join games    as g
  on g.id = p.game_id

join scores   as s
  on s.player_id = p.id

join report_months as rm  -- handy view helper
  on rm.month = s.month

group by
  p.name, g.name

order by
  sum(s.score) desc

-- I can't do this :(
-- limit 0, 40

However, I can't simply fetch the top 40 results and spread them across 4 months as this wouldn't guarantee me 10 for each month.

The Question

How can I modify my query to ensure that I'd get 10 for each month?

+2  A: 

I wouldn't try to make an SQL query that tabulates by month as you have shown.

Instead, query the top 10 players per month as rows, not as columns:

Month    Rank  Player  TotalScore  Game
2010-07     1    plyA   5,000 pts  Centipede
2010-07     2    plyB   3,600 pts  Asteroids
2010-07     3    plyC   2,900 pts  Centipede
...
2010-08     1    plyB   9,400 pts  Solitaire
2010-08     2    plyC   8,200 pts  Centipede
2010-08     3    plyA   7,000 pts  Centipede
...

This becomes a greatest-n-per-group problem, where n is 10.

CREATE VIEW PlayerScoresByMonth AS
  SELECT month, player_id, SUM(value) AS score
  FROM scores
  GROUP BY month, player_id;

SELECT s1.month, COUNT(s2.month)+1 AS Rank, s1.player_id, s1.score AS TotalScore
FROM PlayerScoresByMonth s1
LEFT OUTER JOIN PlayerScoresByMonth s2 ON s1.month = s2.month 
  AND (s1.score < s2.score OR s1.score = s2.score AND s1.player_id < s2.player_id)
GROUP BY s1.month, s1.player_id
HAVING COUNT(*) < 10
ORDER BY s1.month, Rank;

(that's untested but should get you started)

Then you need to write some application code to fetch the results of this query and separate the lists by month, and present the data however you were going to do that.

Bill Karwin
@Bill Karwin, this is incredible. It is exactly what I was looking for. Thank you so much :)
macek
**note:** In the view, I added `JOIN report_months AS rm ON rm.month = scores.month` to limit the rows returned in the view.
macek
Great, I'm glad I could help. And that join to report_months is a fine thing to do.
Bill Karwin
@Bill Karwin, do MySQL views use indexes? If I were to cache the `PlayerScoresByMonth` in a table instead of a view, what indexes would I want to add to speed up the `LEFT OUTER JOIN` you're using?
macek
You can't create an index on a view, but the view gets translated into a query against the base table(s), so it makes use of an applicable index on the base table. See http://dev.mysql.com/doc/refman/5.1/en/view-restrictions.html
Bill Karwin
@Bill Karwin, very awesome; I did not know that. What indexes should I have on my `scores` table to make this the most efficient?
macek
See http://www.slideshare.net/billkarwin/mentor-your-indexes
Bill Karwin
First I'd try a composite index over `(month, score, player_id)` but as I say in the slides, be sure to use EXPLAIN and profiling to make sure any index is being used.
Bill Karwin