tags:

views:

245

answers:

3

Imagine the following schema and sample data (SQL Server 2008):

OriginatingObject
----------------------------------------------
ID
1
2
3

ValueSet
----------------------------------------------
ID   OriginatingObjectID   DateStamp
1    1                     2009-05-21 10:41:43
2    1                     2009-05-22 12:11:51
3    1                     2009-05-22 12:13:25
4    2                     2009-05-21 10:42:40
5    2                     2009-05-20 02:21:34
6    1                     2009-05-21 23:41:43
7    3                     2009-05-26 14:56:01

Value
----------------------------------------------
ID   ValueSetID   Value
1    1            28
etc (a set of rows for each related ValueSet)

I need to obtain the ID of the most recent ValueSet record for each OriginatingObject. Do not assume that the higher the ID of a record, the more recent it is.

I am not sure how to use GROUP BY properly in order to make sure the set of results grouped together to form each aggregate row includes the ID of the row with the highest DateStamp value for that grouping. Do I need to use a subquery or is there a better way?

+4  A: 

You can do it with a correlated subquery or using IN with multiple columns and a GROUP-BY.

Please note, simple GROUP-BY can only bring you to the list of OriginatingIDs and Timestamps. In order to pull the relevant ValueSet IDs, the cleanest solution is use a subquery.

Multiple-column IN with GROUP-BY (probably faster):

SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
(V.OriginatingID, V.DateStamp) IN
(
    SELECT OriginatingID, Max(DateStamp)
    FROM ValueSet
    GROUP BY OriginatingID
)

Correlated Subquery:

SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
V.DateStamp =
(
    SELECT Max(DateStamp)
    FROM ValueSet V2
    WHERE V2.OriginatingID = O.ID
)
Roee Adler
Note this may return duplicate ViewSets in case you have two most-recent ValueSets with the exact same timestamp (however this may be the required behavior).
Roee Adler
...and if not, a Max(V.ID) and extra group by will suffice.
Benjol
Thanks, though I know how to do it using a subquery, I was wondering if there was a better way to do it using a group by clause. Any ideas?
Nathan Ridley
Added a solution using GROUP-BY, but it will required a sub-query...
Roee Adler
Cheers, that will do I think.
Nathan Ridley
+1  A: 
SELECT OriginatingObjectID, id
FROM (
    SELECT id, OriginatingObjectID, RANK() OVER(PARTITION BY OriginatingObjectID 
                                    ORDER BY DateStamp DESC) as ranking
    FROM ValueSet)
WHERE ranking = 1;
Galghamon
+1 That one works too.
Nathan Ridley
A: 

This can be done with a correlated sub-query. No GROUP-BY necessary.

SELECT
  vs.ID,
  vs.OriginatingObjectID,
  vs.DateStamp,
  v.Value
FROM
  ValueSet vs
  INNER JOIN Value v ON v.ValueSetID = vs.ID 
WHERE
  NOT EXISTS (
    SELECT 1 
    FROM   ValueSet 
    WHERE  OriginatingObjectID = vs.OriginatingObjectID 
           AND DateStamp > vs.DateStamp
  )

This works only if there can not be two equal DateStamps for a OriginatingObjectID in the ValueSet table.

Tomalak
Unfortunately there may be two equal date values
Nathan Ridley
Too bad. Then one additional outer GROUP BY would be required. I have given my +1 to Rax Olgud already, his answer covers both approaches pretty well.
Tomalak