ansaurus

Question

Answer 1

+3 A:

This is the greatest-n-per-group problem that comes up frequently on Stack Overflow.

Here's my usual answer:

select
  p.name        player,
  s.date        first_score,
  s.points      points

from  players p

join  scores  s
  on  s.player_id = p.id

left outer join scores  s2
  on  s2.player_id = p.id
      and s2.date < s.date

where
  s2.player_id is null

;

In other words, given score s, try to find a score s2 for the same player, but with an earlier date. If no earlier score is found, then s is the earliest one.

Re your comment about ties: You have to have a policy for which one to use in case of a tie. One possibility is if you use auto-incrementing primary keys, the one with the least value is the earlier one. See the additional term in the outer join below:

select
  p.name        player,
  s.date        first_score,
  s.points      points

from  players p

join  scores  s
  on  s.player_id = p.id

left outer join scores  s2
  on  s2.player_id = p.id
      and (s2.date < s.date or s2.date = s.date and s2.id < s.id)

where
  s2.player_id is null

;

Basically you need to add tiebreaker terms until you get down to a column that's guaranteed to be unique, at least for the given player. The primary key of the table is often the best solution, but I've seen cases where another column was suitable.

Regarding the comments I shared with @OMG Ponies, remember that this type of query benefits hugely from the right index.

Bill Karwin 2010-06-25 18:26:58

OMG Ponies 2010-06-25 18:30:22

@Bill Karwin, if my `join scores s...` has more join conditions than `s.player_id = p.id`, would I copy all of those conditions for the `left outer join scores s2...` as well?

macek 2010-06-25 18:38:29

@OMG Ponies: I have found that using GROUP BY in MySQL is a performance killer, because MySQL almost always creates a temp table. Whereas using the outer join solution (or equivalent NOT EXISTS with a correlated subquery), it's possible to use covering indexes and so the join may be done in memory.

Bill Karwin 2010-06-25 18:40:58

@macek: Yes, the join to s2 must use the same conditions as the join to s, plus the one about comparing dates. And if you have the possibility of ties (more than one score on the same date), you may need an extra join term to resolve the tie.

Bill Karwin 2010-06-25 18:43:17

@Bill Karwin, you're exactly right! I'm getting multiple rows returned for some users because they have about 2-5 scores that fall on the first day they play. How to resolve that?

macek 2010-06-25 18:47:59

Answer 2

A:

Most RDMBs won't even let you include non aggregate columns in your SELECT clause when using GROUP BY. In MySQL, you'll end up with values from random rows for your non-aggregate columns. This is useful if you actually have the same value in a particular column for all the rows. Therefore, it's nice that MySQL doesn't restrict us, though it's an important thing to understand.

A whole chapter is devoted to this in SQL Antipatterns.

Marcus Adams 2010-06-25 18:30:56

Thanks Marcus! :) Also you can make MySQL behave more standardly with `SET SQL_MODE = ONLY_FULL_GROUP_BY`

Bill Karwin 2010-06-25 18:37:50

Coincidentally, @Bill Karwin (the writer the accepted answer for this very question) happens to be the author of that book! Small world :)

macek 2010-06-25 18:40:50

ansaurus

tags:

views:

answers:

Join single row from a table in MySQL

related questions