views:

401

answers:

6

Not sure how to ask a followup on SO, but this is in reference to an earlier question: http://stackoverflow.com/questions/94930/fetch-one-row-per-account-id-from-list

The query I'm working with is:

SELECT *
FROM scores s1
WHERE accountid NOT IN (SELECT accountid FROM scores s2 WHERE s1.score < s2.score)
ORDER BY score DESC

This selects the top scores, and limits results to one row per accountid; their top score.

The last hurdle is that this query is returning multiple rows for accountids that have multiple occurrences of their top score. So if accountid 17 has scores of 40, 75, 30, 75 the query returns both rows with scores of 75.

Can anyone modify this query (or provide a better one) to fix this case, and truly limit it to one row per account id?

Thanks again!

A: 

If you are selecting a subset of columns then you can use the DISTINCT keyword to filter results.

SELECT DISTINCT UserID, score
FROM scores s1
WHERE accountid NOT IN (SELECT accountid FROM scores s2 WHERE s1.score < s2.score)
ORDER BY score DESC
Josh
Distinct is also limiting what I can get from the row to just the accountid+score. I need all fields.
Kenzie
A: 

Does your database support distinct? As in select distinct x from y?

dacracot
+2  A: 
select accountid, max(score) from scores group by accountid;
Paul Tomblin
Much better than my answer :)
Josh
I should have been clearer. I need the whole row, not just the accountid+score. Max is not giving me what I need.
Kenzie
+1  A: 

If your RDBMS supports them, then an analytic function would be a good approach particularly if you need all the columns of the row.

select ...
from   (
   select accountid,
          score,
          ...
          row_number() over 
               (partition by accountid
                    order by score desc) score_rank
   from   scores)
where score_rank = 1;

The row returned is indeterminate in the case you describe, but you can easily modify the analytic function, for example by ordering on (score desc, test_date desc) to get the more recent of two matching high scores.

Other analytic functions based on rank will achieve a similar purpose.

If you don't mind duplicates then the following would probably me more efficient than your current method:

select ...
from   (
   select accountid,
          score,
          ...
          max(score) over (partition by accountid) max_score
   from   scores)
where score = max_score;
David Aldridge
+1  A: 

If you're only interested in the accountid and the score, then you can use the simple GROUP BY query given by Paul above.

SELECT accountid, MAX(score) 
FROM scores 
GROUP BY accountid;

If you need other attributes from the scores table, then you can get other attributes from the row with a query like the following:

SELECT s1.*
FROM scores AS s1
  LEFT OUTER JOIN scores AS s2 ON (s1.accountid = s2.accountid 
    AND s1.score < s2.score)
WHERE s2.accountid IS NULL;

But this still gives multiple rows, in your example where a given accountid has two scores matching its maximum value. To further reduce the result set to a single row, for example the row with the latest gamedate, try this:

SELECT s1.*
FROM scores AS s1
  LEFT OUTER JOIN scores AS s2 ON (s1.accountid = s2.accountid 
    AND s1.score < s2.score)
  LEFT OUTER JOIN scores AS s3 ON (s1.accountid = s3.accountid 
    AND s1.score = s3.score AND s1.gamedate < s3.gamedate) 
WHERE s2.accountid IS NULL
    AND s3.accountid IS NULL;
Bill Karwin
That last one works great, and actually seems clearer to me despite it's length. Thanks!
Kenzie
Two self-joins? On Oracle in particular a large join is going to perform best with a hash join, and that needs equi-joins. I think for a 100 million row table query would be very poor performing, so I'll mark it down for that reason.
David Aldridge
Yes, depending on the RDBMS implementation and the volume of data, joins can be costly. Another option would be to filter out duplicates in the application after fetching the matching rows. But the question was about how to do it in SQL.
Bill Karwin
A: 

This solutions works in MS SQL, giving you the whole row.

SELECT *
FROM scores
WHERE scoreid in
(
  SELECT max(scoreid)
  FROM scores as s2
    JOIN
  (
    SELECT max(score) as maxscore, accountid
    FROM scores s1
    GROUP BY accountid
  ) sub ON s2.score =  sub.maxscore AND s2.accountid = s1.accountid
  GROUP BY s2.score, s2.accountid
)
David B