views:

139

answers:

6

Suppose table 1 Have 1,000,000 rows. In table 2 there are 50,000 rows

INPUT

Table 1

Id    User   InternetAmountDue
1     joe    NULL

Table 2

InternetUserId   UserName AmountDue
21                kay     21.00
10091             joe     21.00

I want to merge data from table 2 to table 1 as follows:

  1. If user exists in Table 1, update InternetAmountDue Column
  2. Else, insert new user

OUTPUT

Table 1

Id    User   InternetAmountDue
1     joe    21.00
2     kay    21.00

How can this be done fast given the large volume of data involved?

A: 

Assuming that you want to merge the data dynamically in a query, you could do the following:

select 
  t1.id,
  t1.user,
  case 
     when t2.AmountDue is not null then t2.AmountDue
     else t1.InternetAmountDue
  end as InternetAmountDue
from table1 t1
left join table2 t2
on t1.user = t2.username

It is of course equally easy to update table 1 with the new amount.

cdonner
A: 

Try this.

UPDATE [Table 1]
SET InternetAmountDue = t2.AmountDue
FROM [Table 1] t1
INNER JOIN [Table 2] t2
ON t1.User = t2.UserName

INSERT INTO [Table 1] ( User, InternetAmountDue )
SELECT UserName, AmountDue 
FROM [Table 2] t2
LEFT OUTER JOIN [Table 1] t1
ON t1.User = t2.UserName
WHERE t1.User IS NULL
enth
what's with the old join syntax in the UPDATE??
KM
Old habit, but yeah...I shouldn't be spreading that.
enth
A: 

Check out the UPSERT statement. It might be what you're looking for.

SurroundedByFish
+7  A: 

SQL Server 2008 provides special construct MERGE just for your case:

MERGE
INTO    table1 AS t1
USING   table2 AS t2
ON      t2.UserName = t1.user
WHEN MATCHED THEN
        UPDATE
        SET    t1.AmountDue = t2.AmountDue
WHEN NOT MATCHED THEN
        INSERT (user, InternetAmountDue)
        VALUES (t2.UserName, t2.AmountDue)
Quassnoi
+1 as much as I liked the old 'UPSERT' in previous SQL versions, MERGE makes life so much easier
SomeMiscGuy
+1, looks quite elegant.
sheepsimulator
A: 

This will work fastest if done by the ID rather than the name. Since you are joining by Name, you will suffer slower performance and possible dups.

Once you resolve the dups and are ready to do the updates, you should:

You should perform the updates first
Then do the inserts

Consider re-indexing

Raj More
+2  A: 
INSERT INTO Table1 (User)
SELECT UserName
FROM Table2
WHERE UserName not in (SELECT User FROM Table1)
 --
UPDATE t1
SET t1.InternetAmountDue = t2.AmountDue
FROM Table1 t1
  JOIN Table2 t2
  ON t1.User = t2.UserName

Make sure that Table2.UserName is indexed. Make sure that Table1.User is indexed.

David B