tags:

views:

188

answers:

2

Hello,

I have the same "problem" as described in (http://stackoverflow.com/questions/1078280/last-record-of-join-table): I need to join a "Master Table" with a "History Table" whereas I only want to join the latest (by date) Record of the the history table. So whenever I query a record for the mastertable I also geht the "latest" data of the History Table.

Master Table
  ID
  FIRSTNAME
  LASTNAME
  ...

History Table
  ID
  LASTACTION
  DATE

This is possible by joining both tables and using a subselect to retrieve the latest history table record as described in the answer given in the link above.

My Quesions are: How can I solve the problem, that there might be in theory two History Records with the same date?

Is this kind of joining with the subselect really the best solution in terms of performance (and in general)? What do you think (I am NO expert in all this stuff) if I integrate a further attribute in the History table that is named "ISLATESTRECORD" as a boolean Flag that I manage manually (and that has a unique constrained). This attribute will then explicitly mark the latest record and I do not need any subselects as I can directly use this attribute in the where clause of the join. On the other hand, this makes inserting a new record of course a little bit more complicated: I first have to remove the "ISLATESTRECORD" flag from the latest record, I have to insert the new History Record with the "ISLATESTRECORD" set and commit the transaction.

What do you think is the recommended solution? I do not have any clue about the performance impact of the subselects: I might have millions of "Mastertable" Records" that I have to search for a specific record also using in the search attributes of the joined History table like: "Give me the Mastertable Record with FIRSTNAME XYZ and the LASTACTION (of the History Table) was "changed_name". So this subselect might be called millions of times.

Or is it better work with a subselect to find the latest record, as subselects are very efficient and its better to keep everything normalized?

Thank you very much

A: 

If the history table has a Primary Key (and all tables should), you can modify the subselect to extract the record with either the larger (or the smaller) PK value of the multiples that match the date criteria...

  Select M.*, H.* 
  From Master M
     Join History H 
         On H.PK = (Select Max(PK) From History
                    Where FK = M.PK
                       And Date = (Select Max(Date) From History
                                   Where FK = M.PK))

As to performance, that can be addressed by adding the appropriate indices to these tables (History.Date, History.FK) but in general, depending on the specific table data distribution patterns, sub queries can adversely affect performance.

Charles Bretana
A: 

I solve your problem with a query on your existing tables, and on your tables with an auto-incrementing identity column added to the history table. By adding an auto-incrementing identity column on your history table, you can get around the unique problem of the dates, and make the query easier.

To solve the problem with your tables (with SQL Server example code):

DECLARE @MasterTable table (MasterID int,FirstName varchar(20),LastName varchar(20))
DECLARE @HistoryTable table (MasterID int,LastAction char(1),HistoryDate datetime)

INSERT INTO @MasterTable VALUES (1,'AAA','aaa')
INSERT INTO @MasterTable VALUES (2,'BBB','bbb')
INSERT INTO @MasterTable VALUES (3,'CCC','ccc')

INSERT INTO @HistoryTable VALUES (1,'I','1/1/2009')
INSERT INTO @HistoryTable VALUES (1,'U','2/2/2009')
INSERT INTO @HistoryTable VALUES (1,'U','3/3/2009')  --<<dups
INSERT INTO @HistoryTable VALUES (1,'U','3/3/2009')  --<<dups
INSERT INTO @HistoryTable VALUES (2,'I','5/5/2009')
INSERT INTO @HistoryTable VALUES (3,'I','7/7/2009')
INSERT INTO @HistoryTable VALUES (3,'U','8/8/2009')

SELECT
    MasterID,FirstName,LastName,LastAction,HistoryDate
    FROM (SELECT
              m.MasterID,m.FirstName,m.LastName,h.LastAction,h.HistoryDate,ROW_NUMBER() OVER(PARTITION BY m.MasterID ORDER BY m.MasterID) AS RankValue
              FROM @MasterTable m
                  INNER JOIN (SELECT
                                  MasterID,MAX(HistoryDate) AS MaxDate
                                  FROM @HistoryTable
                                  GROUP BY MasterID
                             ) dt ON m.MasterID=dt.MasterID
                  INNER JOIN @HistoryTable h ON dt.MasterID=h.MasterID AND dt.MaxDate=h.HistoryDate
         ) AllRows
    WHERE RankValue=1

OUTPUT:

MasterID    FirstName LastName LastAction HistoryDate
----------- --------- -------- ---------- -----------
1           AAA       aaa      U          2009-03-03 
2           BBB       bbb      I          2009-05-05 
3           CCC       ccc      U          2009-08-08 

(3 row(s) affected)

To solve the problem with a better, HistoryTable (with SQL Server example code): it is better because it has an auto-incrementing history id identity column

DECLARE @MasterTable table (MasterID int,FirstName varchar(20),LastName varchar(20))
DECLARE @HistoryTableNEW table (HistoryID int identity(1,1), MasterID int,LastAction char(1),HistoryDate datetime)

INSERT INTO @MasterTable VALUES (1,'AAA','aaa')
INSERT INTO @MasterTable VALUES (2,'BBB','bbb')
INSERT INTO @MasterTable VALUES (3,'CCC','ccc')

INSERT INTO @HistoryTableNEW VALUES (1,'I','1/1/2009')
INSERT INTO @HistoryTableNEW VALUES (1,'U','2/2/2009')
INSERT INTO @HistoryTableNEW VALUES (1,'U','3/3/2009')  --<<dups
INSERT INTO @HistoryTableNEW VALUES (1,'U','3/3/2009')  --<<dups
INSERT INTO @HistoryTableNEW VALUES (2,'I','5/5/2009')
INSERT INTO @HistoryTableNEW VALUES (3,'I','7/7/2009')
INSERT INTO @HistoryTableNEW VALUES (3,'U','8/8/2009')

SELECT
    m.MasterID,m.FirstName,m.LastName,h.LastAction,h.HistoryDate,h.HistoryID
    FROM @MasterTable m
        INNER JOIN (SELECT
                        MasterID,MAX(HistoryID) AS MaxHistoryID
                        FROM @HistoryTableNEW
                        GROUP BY MasterID
                   ) dt ON m.MasterID=dt.MasterID
        INNER JOIN @HistoryTableNEW h ON dt.MasterID=h.MasterID AND dt.MaxHistoryID=h.HistoryID

OUTPUT:

MasterID    FirstName LastName LastAction HistoryDate             HistoryID
----------- --------- -------- ---------- ----------------------- ---------
1           AAA       aaa      U          2009-03-03 00:00:00.000 4
2           BBB       bbb      I          2009-05-05 00:00:00.000 5
3           CCC       ccc      U          2009-08-08 00:00:00.000 7

(3 row(s) affected)
KM