views:

208

answers:

4

I'm looking for a way to handle the following scenario. I have a database table that I need to return only one record for each "group id" that is contained within the table, furthermore the record that is selected within each group should be the oldest person in the household.

ID   Group ID   Name               Age
1   134        John Bowers        37
2   134        Kerri Bowers       33
3   135        John Bowers        44
4   135        Shannon Bowers     42

So in the sample data provided above I would need ID 1 and 3 returned, as they are the oldest people within each group id.

This is being queried against a SQL Server 2005 database.

+3  A: 
SELECT  t.*
FROM    (
        SELECT  DISTINCT groupid
        FROM    mytable
        ) mo
CROSS APPLY
        (
        SELECT  TOP 1 *
        FROM    mytable mi
        WHERE   mi.groupid = mo.groupid
        ORDER BY
                age DESC
        ) t

or this:

SELECT  *
FROM    (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY groupid ORDER BY age DESC) rn
        FROM    mytable
        )
WHERE   rn = 1

This will return at most one record per group even in case of ties.

See this article in my blog for performance comparisons of both methods:

Quassnoi
+1: Yeah, forgot my disclaimer about the ties. Too busy fighting fires.
OMG Ponies
Thanks Quassnoi. I was able to add the gender column in addtion to the age column in your ORDER By clause and get the results that I was looking for! (The gender column was only discussed in a comment after my initial question) Your solution is perfect, and adaptable!
Richard West
Follow up question. These is going to be run against 175 million records. Is one or the other query more efficient?
Richard West
`@Richard West`: how many distinct groups do you have? Do you have a separate table that would contain the groups?
Quassnoi
A: 

SELECT GroupID, Name, Age FROM table INNER JOIN ( SELECT GroupID, MAX(Age) AS OLDEST FROM table ) AS OLDESTPEOPLE ON table.GroupID = OLDESTPEOPLE.GroupID AND table.Age = OLDESTPEOPLE.OLDEST

Chris Simmons
+2  A: 

Use:

SELECT DISTINCT
       t.groupid,
       t.name
  FROM TABLE t
  JOIN (SELECT t.groupid,
               MAX(t.age) 'max_age'
          FROM TABLE t
      GROUP BY t.groupid) x ON x.groupid = t.groupid
                           AND x.max_age = t.age

So what if there's 2+ people with the same age for a group? It'd be better to store the birthdate rather than age - you can always calculate the birthdate for presentation.

OMG Ponies
Beware of the ties!
Quassnoi
A: 

Try this (assuming Group is synonym for Household)

Select * From Table t
Where Age = (Select Max(Age)
             From Table 
             Where GroupId = t.GroupId)

If there are two or more "oldest" people in some household (They all are the same age and there is noone else older), then this will return all of them, not just one at random.

If this is an issue, then you need to add another subquery to return an arbitrary key value for one person in that set.

Select * From Table t
Where Id = 
  (Select Max(Id) Fom Table 
   Where GroupId = t.GroupId
      And Age =
         (Select(Max(Age) From Table
          Where GroupId = t.GroupId))
Charles Bretana