views:

39

answers:

1

After much fiddling, I've managed to install the right ODBC driver and have successfully created a linked server on SQL Server 2008, by which I can access my PostgreSQL db from SQL server.

I'm copying all of the data from some of the tables in the PgSQL DB into SQL Server using merge statements that take the following form:

with mbRemote as 
    (
        select 
            * 
        from 
            openquery(someLinkedDb,'select * from someTable')
    ) 
merge into someTable mbLocal 
using mbRemote on mbLocal.id=mbRemote.id
when matched 
        /*edit*/ 
        /*clause below really speeds things up when many rows are unchanged*/
        /*can you think of anything else?*/
        and not (mbLocal.field1=mbRemote.field1
                and mbLocal.field2=mbRemote.field2
                and mbLocal.field3=mbRemote.field3
                and mbLocal.field4=mbRemote.field4)
        /*end edit*/
        then 
        update
        set
                mbLocal.field1=mbRemote.field1,
                mbLocal.field2=mbRemote.field2,
                mbLocal.field3=mbRemote.field3,
                mbLocal.field4=mbRemote.field4
when not matched then 
        insert
        (
            id,
            field1,
            field2,
            field3,
            field4
        )
        values
        (
            mbRemote.id,
            mbRemote.field1,
            mbRemote.field2,
            mbRemote.field3,
            mbRemote.field4
        )
WHEN NOT MATCHED BY SOURCE then delete;

After this statement completes, the local (SQL Server) copy is fully in sync with the remote (PgSQL server).

A few questions about this approach:

  1. is it sane?
  2. it strikes me that an update will be run over all fields in local rows that haven't necessarily changed. The only prerequisite is that the local and remote id field match. Is there a more fine grained approach/a way of constraining the merge statment to only update rows that have actually changed?
+1  A: 

That looks like a reasonable method if you're not able or wanting to use a tool like SSIS.

You could add in a check on the when matched line to check if changes have occurred, something like:

when matched and mbLocal.field1 <> mbRemote.field1 then

This many be unwieldy if you have more than a couple of columns to check, so you could add a check column in (like LastUpdatedDate for example) to make this easier.

revelator
I tried the SSIS route, but it was really buggy with x64 SqlServer and an "experimental" x64 odbc driver. This route is really an option of last resort, but is rock solid, which is great. I'll add some info to the main question about my speedup tweak, which is along the lines of your suggestion.
spender