views:

736

answers:

3

Here's a real noobish MySQL query problem I'm having.

I have a high score table in a game I'm writing. The high score DB records a name, level, and score achieved. There are many near duplicates in the db. For example:

Name | Level | Score | Timestamp (key)
Bob    2       41    | 1234567.890
Bob    3       15    | 1234568.890
Bob    3       20    | 1234569.890
Joe    2       40    | 1234561.890
Bob    3       21    | 1234562.890
Bob    3       21    | 1234563.890

I want to return a "highest level achieved" high score list, with an output similar to:

Name | Level | Score
Bob    3       21
Joe    2       40

The SQL Query I currently use is:

SELECT *, MAX(level) as level 
FROM highscores 
GROUP BY name
ORDER BY level DESC, score DESC
LIMIT 5

However this doesn't quite work. The "Score" field output always seems to be randomly pulled from the group, instead of taking the corresponding score for the highest level achieved. Eg:

Name | Level | Score
Bob    3       41
Joe    2       40

Bob never got 41 points on level 3! How can I fix this?

+1  A: 

You'll need to use a subquery to pull the score out.

select distinct
    name, 
    max(level) as level,
    (select max(score) from highscores h2 
        where h2.name = h1.name and h2.level = h1.level) as score
from highscores h1 
group by name 
order by level desc, score desc

Cheers,

Eric

It irks me that I didn't take the time to explain why this is the case when I posted the answer, so here goes:

When you pull back everything (*), and then the max level, what you'll get is each record sequentially, plus a column with the max level on it. Note that you're not grouping by score (which would have given you Bob 2 41, and Bob 3 21--two records for our friend Bob).

So, how the heck do we fix this? You need to do a subquery to additionally filter your results, which is what that (select max(score)...) is. Now, for each row that reads Bob, you will get his max level (3), and his max score at that level (21). But, this still gives us however many rows Bob has (e.g.-if he has 5 rows, you'll get 5 rows of Bob 3 21). To limit this to only the top score, we need to use a DISTINCT clause in the select statement to only return unique rows.

UPDATE: Correct SQL (can't comment on le dorfier's post):

SELECT h1.Name, h1.Level, MAX(h1.Score)
    FROM highscores h1
    LEFT OUTER JOIN highscores h2 ON h1.name = h2.name AND h1.level < h2.level
    LEFT OUTER JOIN highscores h3 ON h1.name = h3.name AND h2.level = h3.level AND h1.score < h3.score
    WHERE h2.Name IS NULL AND h3.Name IS NULL
    GROUP BY h1.Name, h1.Level
Eric
Thanks for the explanation Eric - it makes total sense to me, but the code didn't quite work and I can't spot where the problem is. The code now consistently returns Bob's score for his LOWEST level, not his maximum one!Before it seemed to be the first score in the database, which was bad. So this is much better - just not perfect yet :)
Andy Moore
I think you need one more line at the end: GROUP BY h1.Name, h1.Level
Adam Bernier
You're right--didn't copy all of it when I pasted it in. Thanks!
Eric
I slapped an ORDER BY at the end of that update and it worked great, thanks so much eric!
Andy Moore
Glad you got it working! Do you understand it?
Eric
Ooooh wait a minute. It seemed to work ok at first, but after a few applications of it, it seems to be messing up a little. I have a certain high score table where there are 6 usernames in the level 1 category, and it's only showing 3 names on this output. :(
Andy Moore
Actually it did work fine, I'm just going to have to make a new question for the advanced portion of things :)
Andy Moore
+1  A: 

This is efficient.

SELECT h1.Name, h1.Level, h1.Score
FROM highscores h1
LEFT JOIN highscores h2 ON h1.name = h2.name AND h1.level < h2.level
LEFT JOIN highscores h3 ON h1.name = h3.name AND h1.level = h3.level AND h1.score < h3.score
WHERE h2.id IS NULL AND h3.id IS NULL

You're looking for the level/score for which there is no higher level for that user, and no higher score that that level.

le dorfier
I don't have an "id" column, but my key is a microtimestamp. I just swapped in the "timestamp" column name and this query returns zero results for some reason. :/
Andy Moore
Try "WHERE h2.name IS NULL AND h3.name IS NULL
le dorfier
Or if it's new enough version of MySQL you can use WHERE NOT EXISTS (SELECT 1 FROM highscores WHERE h1.name = name AND h1.level < level) AND NOT EXISTS (SELECT 1 FROM highscores WHERE h1.name = name AND h1.level = level AND h1.score < score)
le dorfier
You don't have duplicates (or nulls) for timestamps in any rows do you?
le dorfier
A: 

Interesting problem. Here's another solution:

SELECT hs.name, hs.level, MAX(score) AS score
FROM highscores hs
INNER JOIN (
  SELECT name, MAX(level) AS level FROM highscores GROUP BY name
) hl ON hl.name = hs.name AND hl.level = hs.level
GROUP BY hs.name, hs.level;

Personally, I find this the easiest to understand, and my hunch is that it will be relatively efficient for the database to execute.

I like the above query best, but just for kicks... I find the following one amusing in a kludgey sort of way. Assuming score can't exceed 99999...

SELECT name, level, score
FROM highscores hs
INNER JOIN (
  SELECT name, MAX(level * 100000 + score) AS hfactor
  FROM highscores GROUP BY name
) hf ON hf.hfactor = hs.level * 100000 + hs.score AND hf.name = hs.name;
Indolent Code Monkey
Hm, thanks mmarrk, this is a lot clearer - though possibly less efficient than the other examples? I can't see a difference in my small database benchmarks.
Andy Moore
It shouldn't be less efficient (than the first one I listed at least; the second was a joke). The query doesn't necessarily get executed literally, as you see it. Regardless of the method chosen, the query engine is probably going to have to do some nested loops joins. Depending on the method you choose, you may trade some additional nested loops here for an aggregation operation there, but do some benchmarks (maybe add more data to get a real test) and/or run the queries through EXPLAIN to compare efficiency. In the end, all things being equal, pick the simplest one.
Indolent Code Monkey