Given this data set:
ID Name City Birthyear
1 Egon Spengler New York 1957
2 Mac Taylor New York 1955
3 Sarah Connor Los Angeles 1959
4 Jean-Luc Picard La Barre 2305
5 Ellen Ripley Nostromo 2092
6 James T. Kirk Riverside 2233
7 Henry Jones Chicago 1899
I need to find the 3 oldest persons, but only one of every city.
If it would just be the three oldest, it would be...
- Henry Jones / Chicago
- Mac Taylor / New York
- Egon Spengler / New York
However since both Egon Spengler and Mac Taylor are located in New York, Egon Spengler would drop out and the next one (Sarah Connor / Los Angeles) would come in instead.
Any elegant solutions?
Update:
Currently a variation of PConroy is the best/fastest solution:
SELECT P.*, COUNT(*) AS ct
FROM people P
JOIN (SELECT MIN(Birthyear) AS Birthyear
FROM people
GROUP by City) P2 ON P2.Birthyear = P.Birthyear
GROUP BY P.City
ORDER BY P.Birthyear ASC
LIMIT 10;
His original query with "IN" is extremly slow with big datasets (aborted after 5 minutes), but moving the subquery to a JOIN will speed it up a lot. It took about 0.15 seconds for approx. 1 mio rows in my test environment. I have an index on "City, Birthyear" and a second one just on "Birthyear".
Note: This is related to...