I have a table that has about 1/2 million records in it.
Each month we get about 1/2 million more records to import. These are currently shoved into another table in the DB, but will eventually be loaded directly from a txt file. For each of these new records, I have to determine if we have that record already, and if we don't, then it needs to be inserted. However, if we do have the record it needs to be updated. There is logic for these updates contained the C# code.
A C# command line program is handling the importing of this new data, and so right now there are 1/2 million select statements - one for each record. Then, a bunch (again about 1/2 million) of insert and update statements are generated and ran against the database.
It takes about 6 hours for this to run on my workstation. Do you have any ideas on how to speed it up? I need to run through about 60 of these large imports to bring the database up to the current month, and then load the new data once a month.
I think one area that could be improved is the 1/2 million select statements. Perhaps I could issue one select statement to get all the rows, and store them in memory, and search it. Could I use an List for this, or is there a better class? I'll have to search based on two properties (or DB fields).