tags:

views:

64

answers:

3

Let's say that I have the following result table from my SQL select:

DocumentId        CreationDate       InstanceId
ABC               10th Jan           0c60f4e2-02fc-4244-9ec5-4d259ea5774d
ABC               11th Jan           2168ab5d-d6ca-4db3-90f0-b621d72108b8
BCA               4th Jan            cb7cdf24-b50f-4bd9-b2b5-d58a14793dd8

Notice that InstanceId is different for every now returned; it's essentially the primary key for the table.

How would I modify my select so that I returned only one row per DocumentId, picking the 'newest' (determined by CreationDate), making sure that the InstanceId on the row is the correct one.

So the example results above would instead return:

DocumentId        CreationDate       InstanceId
ABC               11th Jan           2168ab5d-d6ca-4db3-90f0-b621d72108b8
BCA               4th Jan            cb7cdf24-b50f-4bd9-b2b5-d58a14793dd8

(btw, sorry for the horrid question title, feel free to change it to something more suitable)

+1  A: 

Example for Oracle:

SELECT DISTINCT
       DocumentId,
       FIRST_VALUE(CreationDate)
          OVER (PARTITION BY DocumentId
                ORDER BY CreationDate DESC) AS CreationDate,
       FIRST_VALUE(InstanceId)
          OVER (PARTITION BY DocumentId
                ORDER BY CreationDate DESC) AS InstanceId
FROM   mytable;

Obviously the results will be indeterminate if there are duplicate documents with exactly the same CreationDate.

Jeffrey Kemp
If I understand your query correctly, it assumes that a newer CreationDate implies a higher InstanceId thus MAX(CreationDate) always coresponds to MAX(InstanceID). I believe that could not always be true (depeding on the InstanceId generator).
AlexDrenea
@AlexDrenea: He's selecting the InstanceId based on "order by CREATIONDATE desc". It will work, even if there are two documents with the same creationdate. Then will return the one with the highest InstanceId.
Andomar
@Andomar: I did not say it won't work for 2 documents with same CreationDate. I said that the assumption that a newer CreationDate implies a higher InstanceID could be wrong. The question clearly states "making sure that the InstanceId on the row is the correct one" so the query should not be done considering such assumtions.
AlexDrenea
No, the query makes NO assumptions about InstanceID - it selects the correct InstanceId irrespective of whatever values that column takes, because (as Andomar has explained) it orders by CreationDate, not InstanceId.
Jeffrey Kemp
aak that was wrong, sorry - I'm correcting it now
Jeffrey Kemp
+1  A: 

Here's a version for SQL Server. Basically you join the table with the table resulted by grouping the rows by the DocumentId column and getting max(creationDate). Using these 2 columns as where condition to get the Code value. Basically the primary key for the required select is DocumentName and CreationDate. Those uniquely (or should uniquiely) identify the row you are trying to select. To obtain that key we create a second(temp) table with a select and a groupBy clause. We join that table with the original one and use the info for selection.

SELECT
     mt2.DocumentId
    ,mt2.CreationDate
    ,mt1.InstanceId
FROM
    myTable    mt1
    inner join (SELECT 
                     DocumentId  DocumentId
                    ,MAX(CreationDate)  CreationDate
             FROM       
                     myTable
             GROUP BY 
                     DocumentId
               )mt2  on  mt2.DocumentId = mt1.DocumentId 
                     and mt2.CreationDate = mt1.CreationDate
ORDER BY mt2.DocumentId

The creationDate should be unique per DocumentId for the query to run flawlessly. If you need more entries per day per Document, you could consider lowering CreationDate granularity (e.g. add Time component)

AlexDrenea
A: 

This should work on most databases:

SELECT
    cur.DocumentId, cur.CreationDate, cur.InstanceId
FROM
    DocumentVersions cur
LEFT OUTER JOIN
    DocumentVersions next
    ON next.DocumentId = cur.DocumentId
    AND next.CreationDate > cur.CreationDate 
WHERE
    next.DocumentId is null

It joins the Document table against itself, searching for a document with the same id and a higher CreationDate. The where statement says the document with the higher date must not be found, effectively filtering on the newest document per DocumentId.

If there might be multiple documents with the same creation date, you can select the one with the highest InstanceId like:

SELECT
    cur.DocumentId, cur.CreationDate, max(cur.InstanceId)
FROM
    DocumentVersions cur
LEFT OUTER JOIN
    DocumentVersions next
    ON next.DocumentId = cur.DocumentId
    AND next.CreationDate > cur.CreationDate 
WHERE
    next.DocumentId is null
GROUP BY
    cur.DocumentId, cur.CreationDate
Andomar
Will this work if there is only one row for the particular documentId (i.e. the BCA document in my example)?
SCdF
Yeah it should work, the join will not find any document later than the one document, and it'll show in the result.
Andomar