views:

16729

answers:

5

Given this data set:

ID  Name            City            Birthyear
1   Egon Spengler   New York        1957
2   Mac Taylor      New York        1955
3   Sarah Connor    Los Angeles     1959
4   Jean-Luc Picard La Barre        2305
5   Ellen Ripley    Nostromo        2092
6   James T. Kirk   Riverside       2233
7   Henry Jones     Chicago         1899

I need to find the 3 oldest persons, but only one of every city.

If it would just be the three oldest, it would be...

  • Henry Jones / Chicago
  • Mac Taylor / New York
  • Egon Spengler / New York

However since both Egon Spengler and Mac Taylor are located in New York, Egon Spengler would drop out and the next one (Sarah Connor / Los Angeles) would come in instead.

Any elegant solutions?

Update:

Currently a variation of PConroy is the best/fastest solution:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

His original query with "IN" is extremly slow with big datasets (aborted after 5 minutes), but moving the subquery to a JOIN will speed it up a lot. It took about 0.15 seconds for approx. 1 mio rows in my test environment. I have an index on "City, Birthyear" and a second one just on "Birthyear".

Note: This is related to...

+1  A: 

Something like that?

SELECT
  Id, Name, City, Birthyear
FROM
  TheTable
WHERE
  Id IN (SELECT TOP 1 Id FROM TheTable i WHERE i.City = TheTable.City ORDER BY Birthyear)
Tomalak
This is T-SQL, I know. Should be easily adaptable to MySQL or any other dialect.
Tomalak
MySQL doesn't support TOP/LIMIT in "IN"-Subqueries
BlaM
+1  A: 

This is probably not the most elegant and quickest solution, but it should work. I am looking forward the see the solutions of real database gurus.

select p.* from people p,
(select city, max(age) as mage from people group by city) t
where p.city = t.city and p.age = t.mage
order by p.age desc
DrJokepu
This is closest to what I would have done, even though I would try to introduce the ID somewhere because there might be two people with the same age in the same city.
BlaM
BTW: In that case it does not matter which of the two persons is choosen. It should just be "just one".
BlaM
+5  A: 

Probably not the most elegant of solutions, and the performance of IN may suffer on larger tables.

The nested query gets the minimum Birthyear for each city. Only records who have this Birthyear are matched in the outer query. Ordering by age then limiting to 3 results gets you the 3 oldest people who are also the oldest in their city (Egon Spengler drops out..)

SELECT Name, City, Birthyear, COUNT(*) AS ct
FROM table
WHERE Birthyear IN (SELECT MIN(Birthyear)
               FROM table
               GROUP by City)
GROUP BY City
ORDER BY Birthyear DESC LIMIT 3;

+-----------------+-------------+------+----+
| name            | city        | year | ct |
+-----------------+-------------+------+----+
| Henry Jones     | Chicago     | 1899 | 1  |
| Mac Taylor      | New York    | 1955 | 1  |
| Sarah Connor    | Los Angeles | 1959 | 1  |
+-----------------+-------------+------+----+

Edit - added GROUP BY City to outer query, as people with same birth years would return multiple values. Grouping on the outer query ensures that only one result will be returned per city, if more than one person has that minimum Birthyear. The ct column will show if more than one person exists in the city with that Birthyear

ConroyP
This would work with the example data given, but "in real life" there might be two people with the same birth year in the table. This query would return all of them. :(
BlaM
Good point, grouping by city should sort that, answer updated now. If you've 2 people in the same city with the same birthyear, you'll only see one of them - probably the first entered, depending on the default sort mysql is using on your table.
ConroyP
Actually you will not. You will get an error Message, because you cannot use GROUP BY without having any aggregate functions on the "non-grouped" columns.
BlaM
Hmm. Tried it here on mysql 5.0.51 and it works fine. You could put in a count(*) on the outer select to avoid this bug, which would also show you if you'd >1 people in a city with the same birthyear. Updated answer with this extra bit
ConroyP
Uh... It actually works. Wow. I thought GROUP BY always needs aggregate functions, but seems that I was wrong...
BlaM
+1  A: 

Not pretty but should work also with multiple people with the same dob:

Test data:

select id, name, city, dob 
into people
from
(select 1 id,'Egon Spengler' name, 'New York' city , 1957 dob
union all select 2, 'Mac Taylor','New York', 1955
union all select 3, 'Sarah Connor','Los Angeles', 1959
union all select 4, 'Jean-Luc Picard','La Barre', 2305
union all select 5, 'Ellen Ripley','Nostromo', 2092
union all select 6, 'James T. Kirk','Riverside', 2233
union all select 7, 'Henry Jones','Chicago', 1899
union all select 8, 'Blah','New York', 1955) a

Query:

select 
    * 
from 
    people p
    left join people p1
    ON 
     p.city = p1.city
     and (p.dob > p1.dob and p.id <> p1.id)
     or (p.dob = p1.dob and p.id > p1.id)
where
    p1.id is null
order by 
    p.dob
kristof
This one worked, too. But with lots of rows in the table, it gets really slooooow :)
BlaM
A: 

@BlaM

UPDATED just found that its good to use USING instead of ON. it will remove duplicate columns in result.

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 USING(Birthyear, City)
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

ORIGINAL POST

hi, i've tried to use your updated query but i was getting wrong results until i've added extra condition to join (also extra column into join select). transfered to your query, i'am using this:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear AND P2.City = P.City
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

in theory you should not need last GROUP BY P.City, but i've left it there for now, just in case. will probably remove it later.

gondo