ansaurus

Question

Answer 1

A:

Before you start mirroring and replicating data, which is complicated and expensive, it would be worthwhile to check your existing service to make sure it is performing efficiently.

Maybe there are table scans you can get rid of by adding an index, or lookup queries you can get rid of by doing smart error handling? Analyze your execution plans for the queries that your service performs and optimize those.

BC 2010-07-22 15:06:55

Answer 2

+2 A:

One mechanism I've seen is to insert the values into a temporary table - with the same schema as the target table. Null IDs signify new records and populated IDs signify updated records. Then use the SQL Merge command to merge it into the main table. Merge will perform better than individual inserts/updates.

Doing it individually, you will incur maintenance of the indexes on the table - can be costly if its tuned for selects. I believe with merge its a bulk action.

It's touched upon here: http://stackoverflow.com/questions/3282254/whats-a-good-alternative-to-firing-a-stored-procedure-368-times-to-update-the-da/3282289#3282289

There are MSDN articles about SQL merging, so Googling will help you there.

Update: turns out you cannot merge (you can in 2008). Your idea of having another database is usually handled by SQL replication. Again I've seen in production a copy of the current database used to perform a long running action (reporting and aggregation of data in this instance), however this wasn't merged back in. I don't know what merging capabilities are available in SQL Replication - but it would be a good place to look.

Either that, or resolve the reason why you cannot bulk insert/update.

Update 2: as mentioned in the comments, you could stick with the temporary table idea to get the data into the database, and then insert/update join onto this table to populate your main table. The difference is now that SQL is working with a set so can tune any index rebuilds accordingly - should be faster, even with the joining.

Update 3: you could possibly remove the data checking from the insert process and move it to the service. If you can stop inserts into your table while this happens, then this will allow you to solve the issue stopping you from bulk inserting (ie, you are checking for duplicates based on column values, as you don't yet have the luxury of an ID). Alternatively with the temporary table idea, you can add a WHERE condition to first see if the row exists in the database, something like:

INSERT INTO MyTable (val1, val2, val3)
SELECT val1, val2, val3 FROM #Tempo 
WHERE NOT EXISTS 
( 
    SELECT * 
    FROM MyTable t 
    WHERE t.val1 = val1 AND t.val2 = val2 AND t.val3 = val3
)

Adam 2010-07-22 15:07:14

+1 A bulk insert of 350.000 rows should complete in less than a minute. `merge` is not available in 2005, but from the question, an `insert into RealTable (col1, col2) select col1, col2 from TempTable` would work here

Andomar 2010-07-22 15:12:21

Ah I hadn't noticed the tag - I'll amend my answer. Unfortunately the OP also states he cannot bulk insert.

Adam 2010-07-22 15:13:14

Answer 3

+2 A:

We do much larger imports than that all the time. Create an SSIS pacakge to do the work. Personally I prefer to create a staging table, clean it up, and then do the update or import. But SSIS can do all the cleaning in memory if you want before inserting.

HLGEM 2010-07-22 18:15:13

ansaurus

tags:

views:

answers:

Large Data Service Architecture

related questions