tags:

views:

373

answers:

4

I have inherited a table with a structure something like this:

ID   Name   Timestamp   Data
----------------------------
1    A      40          ...
2    A      30          ...
3    A      20          ...
4    B      40          ...
5    B      20          ...
6    C      30          ...
7    C      20          ...
8    C      10          ...

ID is an identity field and the primary key and there are non-unique indexes on the Name and Timestamp fields.

What is the most efficient way to get the most recent record for each item name, i.e. in the table above rows 1,4 and 6 should be returned as they are the most up-to-date entries for items A,B and C respectively.

+5  A: 

SQL Server 2005 (onwards):

WITH MostRecentRows AS
(
    SELECT ID, Name, Data
    ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TimeStamp DESC) AS 'RowNumber'
    FROM MySchema.MyTable
) 
SELECT * FROM MostRecentRows 
WHERE RowNumber = 1
Mitch Wheat
+1 exactly what I was thinking about as well.
Sung Meister
+3  A: 

Assuming there are no duplicate timestamps per name, something like this should work:

SELECT ID, Name, Timestamp, Data
FROM test AS o
WHERE o.Timestamp = (SELECT MAX(Timestamp)
                     FROM test as i
                     WHERE i.name = o.name)
Jon
+1 for Correlated Subqueries
Sung Meister
Yes, this would only work if there are no duplicate timestamps. The CTE description is definately safer since it will only return one row.
Registered User
+2  A: 

SQL Server 2000:

SELECT
  ID, Name, Timestamp, Data
FROM
  DataTable
  INNER JOIN
  (
     SELECT ID, MAX(Timestamp) Timestamp FROM DataTable GROUP BY ID
  ) latest ON 
    DataTable.ID = Latest.ID AND 
    DataTable.Timestamp = Latest.Timestamp
Tomalak
+1 for an alternative SQL Server 2000 solution
Sung Meister
This could return more than one record if there are multiple records for one time stamp.
Registered User
I am perfectly aware of this fact. An additional outer GROUP BY clause migrates this risk, should duplicate timestamps be possible.
Tomalak
A: 

If you are using SQL Server 2005/2008, then the CTE solution already listed by Mitch Weat is the best from a performance perspective. However, if you are using SQL Server 2000, then you can't assume there aren't duplicate Name | TimeStamp combinations. Use the following code to return only one record per name:

SELECT ID , Name , TimeStamp , Data FROM DataTable dt INNER JOIN (SELECT Name , MIN(DataTable.ID) AS MinimumID FROM DataTable
INNER JOIN
(SELECT Name , MAX(Timestamp) AS Timestamp FROM DataTable GROUP BY Name) latest ON DataTable.Name = Latest.Name AND DataTable.Timestamp = Latest.Timestamp GROUP BY Name) MinimumLatest ON dt.ID = MinimumLatest.ID

So if you add another record like 9 C 30, then this will only return ID 6. If you don't go this far, then you may end up return 9 C 30 and 6 C 30.

Registered User