Have a new project with the following setup and requirments:-
My client has a MSSQL 2005 server (A) in their office. Their vendor has a MSSQL 2005 server (B) in another part of the world, which contains real-time transactional data. My client wants to load the data from (B) to (A) on a daily basis during non office hours. They have datareader access to (B) but that is about it, the vendor will not be doing replication, log shipping etc and my client is solely responsible for getting their own data so that they can run their own reports/cubes.
The script I used is as follows using distributed TSQL and linked server to (B) :-
DECLARE @sqlCommand VARCHAR(2000)
DECLARE @LastProcessedDate DATETIME
-- run the following code for Table 1 to Table XX
SELECT @LastProcessedDate = LastProcessedDate
FROM [ProcessControl]
WHERE TableName = 'table_1'
SET @sqlCommand = 'INSERT INTO Table1
SELECT *
FROM OPENQUERY(VendorsLinkedServerName,
''SELECT *
FROM Table1
WHERE LastModified >= '''' + @LastProcessedDate + '''')'
EXEC @sqlCommand
I did an initial trial for the 10 largest tables for 1 full day of data and it took 1 hour which is too long. Also for the test, I have already removed all indexes and constraints except the primary key (which comprise 1-4 BIGINT columns) for the tables. Any suggestions on how I can speed up the load time or go about loading the data?
edit: just to add, in case you wonder why the select statement was written this way, in the above example Table1 in (A) is in an ETL database and the data will subsequently be compared to determine insert/update/delete in the actual reporting database in (A)