tags:

views:

519

answers:

5

I have a parent table with entries for documents and I have a history table which logs an audit entry every time a user accesses one of the documents.

I'm writing a search query to return a list of documents (filtered by various criteria) with the latest user id to access each document returned in the result set.

Thus for


    DOCUMENTS
    ID | NAME
    1  | Document 1
    2  | Document 2
    3  | Document 3
    4  | Document 4
    5  | Document 5

    HISTORY
    DOC_ID | USER_ID | TIMESTAMP
    1      | 12345   | TODAY
    1      | 11111   | IN THE PAST
    1      | 11111   | IN THE PAST
    1      | 12345   | IN THE PAST
    2      | 11111   | TODAY
    2      | 12345   | IN THE PAST
    3      | 12345   | IN THE PAST

I'd be looking to get a return from my search like


    ID | NAME       | LAST_USER_ID
    1  | Document 1 | 12345
    2  | Document 2 | 11111
    3  | Document 3 | 12345
    4  | Document 4 | 
    5  | Document 5 | 

Can I easily do this with one SQL query and a join between the two tables?

+2  A: 

I think it should be something like this:

SELECT ID, Name,  b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a LEFT JOIN
 ( SELECT DOC_ID, USER_ID 
          FROM HISTORY
              GROUP BY DOC_ID, USER_ID
              HAVING MAX( TIMESTAMP )) as b
    ON a.ID = b.DOC_ID

this might work also:

SELECT ID, Name,  b.USER_ID as LAST_USER_ID
FROM DOCUMENTS a 
  LEFT JOIN HISTORY b ON a.ID = b.DOC_ID
GROUP BY DOC_ID, USER_ID
HAVING MAX( TIMESTAMP )
sfossen
You'd need a LEFT OUTER JOIN to get the last three rows of the required output.
Jonathan Leffler
I couldn't get having max(timestamp) to work on SQL server - does that work on other DBs?
Andy White
And you would probably need: GROUP BY DOC_ID, USER_ID, and you'd need to fix up the HAVING clause...
Jonathan Leffler
@Jonathan - do you know if you can use having in this case? I messed with it a little bit but couldn't get the group by/having to work at all for this. I'm a SQL noob, I'd love to see how to do it if you know.
Andy White
+3  A: 

I couldn't get the "HAVING MAX(TIMESTAMP)" to run in SQL Server - I guess having requires a boolean expression like "having max(TIMESTAMP) > 2009-03-05" or something, which doesn't apply in this case. (I might be doing something wrong...)

Here is something that seems to work - note the join has 2 conditions (not sure if this is good or not):

select
    d.ID,
    d.NAME,
    h."USER_ID" as "LAST_USER_ID"
from Documents d
left join History h
    on d.ID = h.DOC_ID
    and h."TIMESTAMP" =
    (
        select max("TIMESTAMP")
        from "HISTORY"
        where "DOC_ID" = d.ID
    )
Andy White
I guess I could have used max(TIMESTAMP) in the inner query, rather than select top 1 ... order by [TIMESTAMP] desc
Andy White
...Fixed in my answer
Andy White
And you should lose the MS SQL Server barbarisms with square brackets - or replace them with DB2 equivalent notation "TIMESTAMP".
Jonathan Leffler
sorry about that... it's all I know
Andy White
select d.ID, d.NAME, h."USER_ID" as 'LAST_USER_ID'from Documents dleft join History h on d.ID = h.DOC_ID and h."TIMESTAMP" = ( select max("TIMESTAMP") from "HISTORY" where "DOC_ID" = d.ID )
Andy White
I'm having to map from Informix to DB2 - parallel but different issues. :D
Jonathan Leffler
Ok, I tried to fix it for real :) I don't really know the ISO standard though... although I do know [] are not standard!
Andy White
Yes gents the [] are not DB2 standard, the column names as specified (e.g. TIMESTAMP) suffice. One thing I have seen is that DB2 isn't liking the (SELECT...) as a comparison clause in the ON for the JOIN
MadMurf
Interesting... I wondered if that would work on all databases, it's kind of a strange join clause. Have you tried running Jonathan's query below?
Andy White
Yes, thats looking more hopeful. Seems to be returning me the expected results. Generating more test data to verify further. Thanks for the help.
MadMurf
+2  A: 

This doesn't use a join, but for some queries like this I like to inline the select for the field. If you want to catch the situation when no user has accessed you can wrap it with an NVL().

select a.ID, a.NAME,
(select x.user_id
 from HISTORY x
 where x.doc_id = a.id
   and x.timestamp = (select max(x1.timestamp)
                      from HISTORY x1
                      where x1.doc_id = x.doc_id)) as LAST_USER_ID
from DOCUMENTS a
where <your criteria here>
John Wagenleitner
+4  A: 

Revising what Andy White produced, and replacing square brackets (MS SQL Server notation) with DB2 (and ISO standard SQL) "delimited identifiers":

SELECT d.id, d.name, h.last_user_id
    FROM Documents d LEFT JOIN
         (SELECT r.doc_id AS id, user_id AS last_user_id
              FROM History r JOIN
                   (SELECT doc_id, MAX("timestamp") AS "timestamp"
                        FROM History
                        GROUP BY doc_id
                   ) AS l
                   ON  r."timestamp" = l."timestamp"
                   AND r.doc_id      = l.doc_id
         ) AS h
         ON d.id = h.id

I'm not absolutely sure whether "timestamp" or "TIMESTAMP" is correct - probably the latter.

The advantage of this is that it replaces the inner correlated sub-query in Andy's version with a simpler non-correlated sub-query, which has the potential to be (radically?) more efficient.

Jonathan Leffler
A: 
Select ID, Name, User_ID
From Documents Left Outer Join
History a on ID = DOC_ID
Where ( TimeStamp = ( Select Max(TimeStamp)
                      From History b
                      Where a.DOC_ID = b.DOC_ID ) OR
        TimeStamp Is NULL )  /* this accomodates the Left */
Lynette Duffy