views:

242

answers:

2

I have a table that has about 1/2 million records in it.

Each month we get about 1/2 million more records to import. These are currently shoved into another table in the DB, but will eventually be loaded directly from a txt file. For each of these new records, I have to determine if we have that record already, and if we don't, then it needs to be inserted. However, if we do have the record it needs to be updated. There is logic for these updates contained the C# code.

A C# command line program is handling the importing of this new data, and so right now there are 1/2 million select statements - one for each record. Then, a bunch (again about 1/2 million) of insert and update statements are generated and ran against the database.

It takes about 6 hours for this to run on my workstation. Do you have any ideas on how to speed it up? I need to run through about 60 of these large imports to bring the database up to the current month, and then load the new data once a month.

I think one area that could be improved is the 1/2 million select statements. Perhaps I could issue one select statement to get all the rows, and store them in memory, and search it. Could I use an List for this, or is there a better class? I'll have to search based on two properties (or DB fields).

+2  A: 

Take a look at the .NET Framework 2.0's SqlBulkCopy class.

MSDN Ref.

Mitch Wheat
Thanks, I'll look into that. I'm always finding new things in the BCL. Don't know that it will be worth redoing all the LINQ-to-SQL for quering and inserting, but I may have to if I want better performance.
Lance Fisher
A: 

Yes, move the logic to a single stored proc that will do a Bulk Insert into a temp table (without Logging, and then process all the records in the temp table in two separate statements... One Update for all the records that do exist in the desti Table, and One Insert for all those that do not

   Update DestTable Set
       ColName = T.ColName,
       [repeat for all cols]
    From TmpTable T Join DestTable D On D.Pk = T.Pk

    Insert DestTable(ColList)
    Select [ColList]
    From TmpTable T
    Where Not Exists (Select * From DestTable
                      Where pk = T.Pk)

If this creates transactions too large for your transaction log, break it up into smaller chunks

Charles Bretana