After much fiddling, I've managed to install the right ODBC driver and have successfully created a linked server on SQL Server 2008, by which I can access my PostgreSQL db from SQL server.
I'm copying all of the data from some of the tables in the PgSQL DB into SQL Server using merge statements that take the following form:
with mbRemote as
(
select
*
from
openquery(someLinkedDb,'select * from someTable')
)
merge into someTable mbLocal
using mbRemote on mbLocal.id=mbRemote.id
when matched
/*edit*/
/*clause below really speeds things up when many rows are unchanged*/
/*can you think of anything else?*/
and not (mbLocal.field1=mbRemote.field1
and mbLocal.field2=mbRemote.field2
and mbLocal.field3=mbRemote.field3
and mbLocal.field4=mbRemote.field4)
/*end edit*/
then
update
set
mbLocal.field1=mbRemote.field1,
mbLocal.field2=mbRemote.field2,
mbLocal.field3=mbRemote.field3,
mbLocal.field4=mbRemote.field4
when not matched then
insert
(
id,
field1,
field2,
field3,
field4
)
values
(
mbRemote.id,
mbRemote.field1,
mbRemote.field2,
mbRemote.field3,
mbRemote.field4
)
WHEN NOT MATCHED BY SOURCE then delete;
After this statement completes, the local (SQL Server) copy is fully in sync with the remote (PgSQL server).
A few questions about this approach:
- is it sane?
- it strikes me that an update will be run over all fields in local rows that haven't necessarily changed. The only prerequisite is that the local and remote id field match. Is there a more fine grained approach/a way of constraining the merge statment to only update rows that have actually changed?