I need to take data from one table and import it into another table. In pseudocode, something like this:

For Each row in table1
If row.personid is in table2 then
   update table2.row
   insert row into table2
End If

What is the best way to do this in T-SQL? As I understand it T-SQL doesn't support For Each..Next, so what alternatives do I have?


One of the most common ways is to use cursors. That way you can go through each record that your query returns and handle it accordingly, either with an UPDATE or INSERT.


Christopher Estep
Thanks. I saw this as an option - but also read that this was against best practices b/c cursors is ineffective in memory management, etc.?
@davemackey You read correctly. I can't believe 2 people have suggested cursors for this!
Martin Smith
Don't shoot the messenger, Martin. Cursors are a reasonable response to the pseudocode given and without knowing more information, may be a perfectly fine solution. It depends on how it's going to be used, table size, etc.
Christopher Estep
I'd default to not using them when an obvious set based solution exists though. Interesting thread on this issue.
Martin Smith
+2  A: 

You could use a cursor for this as others have described. Personally I like doing two statements in a row like so:

UPDATE tbl2 SET field1=tbl1.field1, field2=tbl1.field2 -- etc.
FROM tb12
JOIN tbl1 on tbl2.personid = tbl1.personid

INSERT tbl2 (personid, field1, field2)
SELECT personid, field1, field2 
FROM tbl1
WHERE NOT EXISTS (select personid from tbl2 where personid = tbl1.persondid)
+3  A: 

All things being equal, set based operations are better.

update t1
set t1.x = t2.x
from table1 t1
inner join table2 t2 on = t2.t1id


insert into table1
select * from table2 t2 where t2.t1id not in (select from table1 )
@jvilalta are you saying the larger the data set the more reason to loop through each record? I'd have to disagree.
Not that there is more reason, just that doing inserts and updates in one statement puts this all in one transaction, and if the dataset is large enough, then the transcation might fail or it will run for a long time, preventing other transactions from finishing.
@jvilalta I'm pretty sure it won't run as long or take up as much resources as looping through each record.
and in fact should run fine concurrently with other readers, at least in snapshot isolation mode?
@TskTsk If that is the case, you could perform the set based queries over subsets of the key on the table. Ie personid 0 to 500000 and 500001-1000000 etc... and it would still be faster than 1 by one... You could order the records to be inserted via an identity key and insert 500,000 at a time (or whatever the number is) that way as well. Usually databases have a certain overhead associated with data access so by accessing things one at a time you pay for the overhead for each record while set based things get the overhead once for all the records in the set....
+2  A: 

doing this in a while loop is just wrong.
for your situatuin you can use the new MERGE statement in sql server 2008.
Here's a simple example on how to do it.

Mladen Prajdic
+2  A: 

You state TSQL but don't give a version. If you are on SQL2008 the Merge statement should do what you need.

Martin Smith
Thanks, sorry, we use SQL 2008 but some of our DB's are running in SQL 2005/2000 compatibility mode (third-party legacy software)
It might still work. Have you tried it? Edit: It says here it is supported for SQL 2005 compatibility mode but is silent on the 2000 issue!
Martin Smith
+1  A: 

If you're on SQL Server 2008 then the best way to do this is with the MERGE statement. Something like...

MERGE INTO target_table t
USING source_table s
ON t.personid = s.personid
    UPDATE ...
    INSERT ...
Greg Beech
+2  A: 

If you're using SQL Server 2008 then you could use the MERGE statement. Maybe something like this:

MERGE table2 AS t  -- target
USING table1 AS s  -- source
    ON ( t.personid = s.personid )
    SET second_column = s.second_column,
        third_column = s.third_column,
        etc = s.etc
    INSERT ( personid, second_column, third_column, etc )
    VALUES ( s.personid, s.second_column, s.third_column, s.etc )