views:

496

answers:

3

I'm trying to update Table1 in DB1 with data from Table2 in DB2. I can connect and get the data from DB2 Table2 into DB1 Table1, but the issue I'm having is getting the MOST RECENT data from DB2 Table2.

I'm looking at 3 fields in DB2: f1, f2, & f3. f1 contains duplicates (and is where I'm matching from DB1 Table1) and f3 is a date field and I want to grab the most recent date to update DB1 Table1. Below is some of the code I've been using:

Update Table1
Set f2 = c.f2, 
    f3 = convert(varchar, c.f3, 101) 
From Table1 b 
    inner join Server.DB.dbo.Table2 c on b.f1 = c.f1
Where b.f1 = c.f1

Sample Data:

c.f1    c.f2      c.f3
8456    RS47354      06/30/2009
8456    M101021      10/31/2009 (want this one)
7840    5574      NULL
7840    RH013057     06/30/2010 (want this one)
7650    RS48100      06/30/2007
7650    RS49010      06/30/2009 (want this one)

b.f1        b.f2         b.f3
8456        Null         Null
7840        Null         Null
7650        Null         Null

Eventually, this will be set inside an SSIS package.

Any and all help appreciated!

-JFV

A: 
and (b.f3 > c.f3 OR b.f3 is null)

In your where clause, of course.

If I'm reading your question right, anyway...

Telos
I don't think that's the answer. b.f3 is NULL for every row. He wants to grab the row with the MAX(c.f3)
Michael Pryor
Good point, I blame mornings. Added an OR clause to handle that... again, if I'm reading correctly, b.f3 won't always be null as it will get updated once, then next time he runs this it still needs to work.
Telos
+1  A: 

I'm not sure if this is the fasted code in the world, it obviously depends on how close the two servers are and how much data you have in each table.

UPDATE Table1
SET 
    f2 = T2.f2, 
    f3 = convert(varchar, T2.f3, 101) 
FROM 
    Table1 T1
INNER JOIN 
    Server.DB.dbo.Table2 T2
ON 
    T1.f1 = T2.f1
WHERE 
    T2.f3 = (SELECT MAX(f3) FROM Server.DB.dbo.Table2 WHERE f1 = T1.f1)

An alternative (if you have that much control) is to create a trigger on Table2 which puts the latest version into a temporary table whenever it is updated.

Update: Corrected the code.

samjudson
You're comparing f1 to f3, so I'm pretty sure that this won't work correctly
Tom H.
@samjudson I was able to get the code working using your method, but it does take some time to get it updated. I was thinking about setting up a trigger to update Table1 immediately after Table2 was updated or inserted that way the information would be updated real-time instead of nightly. Do you think that a triggered update will be faster (after I update en masse the first time)?
JFV
Yes, much faster. Also, I notice you are converting the date from a date to a string. If it is possible to have both fields as dates then it would speed things up as the conversion would not have to take place.
samjudson
+1  A: 
UPDATE
     T1
SET
     f2 = T2.f2,
     f3 = T2.f3  -- If it's a date, save it as a date, not a VARCHAR
FROM
     dbo.Table1 T1
INNER JOIN Server.db.dbo.Table2 T2 ON
     T2.f1 = T1.f1
LEFT OUTER JOIN Server.db.dbo.Table2 T2_later ON
     T2_later.f1 = T2.f1 AND
     T2_later.f3 > T2.f3
WHERE
     T2_later.f1 IS NULL

This may have some performance problems doing it across servers if Table2 is large. It might be better to create a view in that database and use that for the updates:

CREATE VIEW dbo.T2_Latest
AS
     SELECT
          T2.f1,
          T2.f2,
          T2.f3
     FROM
          dbo.Table2 T2
     LEFT OUTER JOIN dbo.Table2 T2_later ON
          T2_later.f1 = T2.f1 AND
          T2_later.f3 > T2.f3
     WHERE
          T2_later.f1 IS NULL

Then you just need to join on f1 (you don't need that criteria in both the INNER JOIN and the WHERE clause by the way). The view will filter out the earlier rows BEFORE it needs to compare them across servers.

In SSIS there are other solutions, using the Merge component, Lookup component, or Join component which will probably work better.

Tom H.
@Tom H. I tried the View, but it's still coming up with duplicate data and I'm not sure why... btw, the date field is going to a nvarchar(10) in the other DB, that's why I reformatted it.
JFV
If there are multiple rows with the same exact date for the same ID then you will get duplicates. You would need to further define what constitutes the "latest" row in that case. As for the date to nvarchar(10), I assumed that you were converting it because that's what it is in the destination DB. My point was that it shouldn't be that in the destination.
Tom H.