tags:

views:

59

answers:

3
+2  Q: 

TSQL Select Max

Userid   FirstName   LastName        UserUpdate 
1        Dan         Kramer          1/1/2005  
1        Dan         Kramer          1/1/2007  
1        Dan         Kramer          1/1/2009  
2        Pamella     Slattery        1/1/2005  
2        Pam         Slattery        1/1/2006  
2        Pam         Slattery        1/1/2008  
3        Samamantha  Cohen           1/1/2008  
3        Sam         Cohen           1/1/2009  

I need to extract the latest updated for all these users, basically here's what I'm looking for:

Userid   FirstName   LastName        UserUpdate  
1        Dan         Kramer          1/1/2009     
2        Pam         Slattery        1/1/2008   
3        Sam         Cohen           1/1/2009  

Now when I run the following:

SELECT Userid, FirstName, LastName, Max(UserUpdate) AS MaxDate FROM Table GROUP BY Userid, FirstName, LastName

I still get duplicates, something like this:

Userid   FirstName   LastName        UserUpdate 
1        Dan         Kramer          1/1/2009  
2        Pamella     Slattery        1/1/2005  
2        Pam         Slattery        1/1/2008  
3        Samamantha  Cohen           1/1/2008  
3        Sam         Cohen           1/1/2009 
+3  A: 

You aren't getting duplicates. 'Pam' is not equal to 'Pamella' from the perspective of the database; the fact that one is a colloquial shortening of the other doesn't mean anything to the database engine. There really is no reliable, universal way to do this (since there are names that have multiple abbreviations, like "Rob" or "Bob" for "Robert", as well as abbreviations that can suit multiple names like "Kel" for "Kelly" or "Kelsie", let alone the fact that names can have alternate spellings).

For your simple example, you could simply select and group by SUBSTRING(FirstName, 1, 3) instead of FirstName, but that's just a coincidence based upon your sample data; other name abbreviations would not fit this pattern.

Adam Robinson
and let's not forget that people misspell too...that will always add to the difficulty!
Leslie
Well sure, but I was thinking if I could just get the data by UserId - forget about the first/last names - and associate the UserId to the latest Date and then pull the first name and last name based on that associationn? It sounds like I need to add another table but unfortuantely I can't.
firedrawndagger
Do it by UserId, not name
Mike M.
@firedrawndagger: Then just change your group by to `UserID` and use `MAX(FirstName)` for `FirstName` and `MAX(LastName)` for `LastName`. This is *terrible* design, though, as you 1) are violating second normal form (by storing the first and last names with each detail record instead of in a master record with a foreign key) and 2) risking getting a first and last name that don't go together (since there's nothing stopping you from having a completely different first and last name with the same user ID...which is the reason for second normal form to begin with).
Adam Robinson
@Mike - that's exactly what I'm doing but I need the GROUP BY to incorporate both the UserId and Firstname and LastName@Adam - wow I never thought of it that way. Yes as far as I'm concerned the DB is far from normalized, there's no argument there. Now with doing a Max on Lastname, Firstname and Date - would there ever be an issue where the Max on the Date does not return the latest due to Max on LastName and FirstName? Is there a certain order - I thought SQL decides on its own how to run the query so there's no way to specify which Max should go first.
firedrawndagger
@firedrawndagger: The specific aggregates have no effect on one another. `MAX(FirstName)` will return the "maximum" `FirstName` within the group (so for a given `UserID`), `MAX(LastName)` will do the same for `LastName`, irrespective of whatever the `FirstName` might be. Same goes for `MAX(UserUpdate)`. The issue with mismatching is that, for example, the `MAX(FirstName)` might not be on the same record as `MAX(UserUpdate)`. For example, you'd get back `Pamella`, `Slattery`, and `1/1/2008`, even though `Pamella` goes with a record from 2005 and the 2008 record has `Pam` for the `FirstName`.
Adam Robinson
ok that makes sense, still a problem but it makes sense
firedrawndagger
@firedrawndagger: Unfortunately, this problem is completely unsolvable unless you have a way to uniquely identify a row in your table.
Adam Robinson
+3  A: 

try:

declare @Table table (userid int,firstname varchar(10),lastname varchar(20), userupdate datetime)
INSERT @Table VALUES (1, 'Dan'         ,'Kramer'          ,'1/1/2005')  
INSERT @Table VALUES (1, 'Dan'         ,'Kramer'          ,'1/1/2007')  
INSERT @Table VALUES (1, 'Dan'         ,'Kramer'          ,'1/1/2009')  
INSERT @Table VALUES (2, 'Pamella'     ,'Slattery'        ,'1/1/2005')  
INSERT @Table VALUES (2, 'Pam'         ,'Slattery'        ,'1/1/2006')  
INSERT @Table VALUES (2, 'Pam'         ,'Slattery'        ,'1/1/2008')  
INSERT @Table VALUES (3, 'Samamantha'  ,'Cohen'           ,'1/1/2008')
INSERT @Table VALUES (3, 'Sam'         ,'Cohen'           ,'1/1/2009') 

SELECT
    dt.Userid,dt.MaxDate
        ,MIN(a.FirstName) AS FirstName, MIN(a.LastName) AS LastName
    FROM (SELECT 
              Userid, Max(UserUpdate) AS MaxDate 
              FROM @Table GROUP BY Userid
         ) dt
        INNER JOIN @Table a ON dt.Userid=a.Userid and dt.MaxDate =a.UserUpdate
    GROUP BY dt.Userid,dt.MaxDate

OUTPUT:

Userid      MaxDate                 FirstName  LastName
----------- ----------------------- ---------- --------------------
1           2009-01-01 00:00:00.000 Dan        Kramer
2           2008-01-01 00:00:00.000 Pam        Slattery
3           2009-01-01 00:00:00.000 Sam        Cohen
KM
I do like that solution... however I also discovered that some date fields might be null which means that the record get excluded - what should I do in that case?
firedrawndagger
what fields might be NULL, you did not provide a table definition, and how would you like to handle that? it is your app you decide how to handle nulls, and I'll help write the code. if you leave it up to me, I choose to ignore any problems related to NULLs ;-)
KM
@KM good point - the null would be on the updated date, which gets updated whenver well the record gets updated. Otherewise if it's a new record then by default the updated date is Null. I would ignore the nulls as well... but the problem's that I have to hook up the User Table to another table that doesn't output user first name and last name but only their ID number.
firedrawndagger
@firedrawndagger, if this is a table that tracks users editing their settings, I'd just add a Status column, so "A"=active and "D"=deleted and each user would have only one "A" row which would be the most recent and then your query would be simple and have no aggergate necessary to find the most recent. Another strategy would be to have a Version column each current row would be 0 and historic versions would be 1 to n. This way you can store the userid+version if you need to FK to a point in time, but if you want to FK to the current just use the userid, like `where userid=@x and Version=0`
KM
+1  A: 

Or use a subquery...

SELECT
   a.userID,
   a.FirstName,
   a.LastName,
   b.MaxDate
FROM
      myTable a
   INNER JOIN
      (   SELECT
             UserID,
             Max(ISNULL(UserUpdate,GETDATE())) as MaxDate
          FROM
             myTable
          GROUP BY
             UserID
      ) b
   ON
          a.UserID = b.UserID
      AND a.UserUpdate = b.MaxDate

The subquery (named "b") returns the following:

Userid   UserUpdate  
1        1/1/2009     
2        1/1/2008   
3        1/1/2009 

The INNER JOIN between the subquery and the original table causes the original table to be filtered for matching records only -- i.e., only records with a UserID/UserUpdate pair that matches a UserID/MaxDate pair from the subquery will be returned, giving you the unduplicated result set you were looking for:

Userid   FirstName   LastName        UserUpdate  
1        Dan         Kramer          1/1/2009     
2        Pam         Slattery        1/1/2008   
3        Sam         Cohen           1/1/2009  

Of course, this is just a work-around. If you really want to solve the problem for the long-term, you should normalize your original table by splitting it into two.

Table1:

Userid   FirstName   LastName 
1        Dan         Kramer   
2        Pam         Slattery 
3        Sam         Cohen

Table2:

Userid   UserUpdate  
1        1/1/2007     
2        1/1/2007   
3        1/1/2007  
1        1/1/2008     
2        1/1/2008   
3        1/1/2008 
1        1/1/2009     
2        1/1/2009   
3        1/1/2009 

This would be a more standard way to store data, and would be much easier to query (without having to resort to a subquery). In that case, the query would look like this:

SELECT
   T1.UserID,
   T1.FirstName,
   T1.LastName,
   MAX(ISNULL(T2.UserUpdate,GETDATE()))
FROM
      Table1 T1
   LEFT JOIN
      Table2 T2
   ON
      T1.UserID = T2.UserID
GROUP BY
   T1.UserID,
   T1.FirstName,
   T1.LastName
dave
I saw your comment to KM about null fields...I want to assume the only possible null field would be UserUpdate... In that case, just use:ISNULL(UserUpdate, GETDATE())
dave
@dave that sounds like an idea... how exactly would that look like using your query?
firedrawndagger
The subquery is treated just like another table. I posted an example above -- hope that helps.
dave
Oh -- you're asking about the ISNULL function -- sorry -- I've added it to my query above.
dave