I have an application that imports large volumes of data daily, several 100 thousands records.
Data comes from different sources. The data is read using C#, then bulk inserted into the database.
This data is then processed:
- different tables are linked
- new tables are generated
- data is corrected using complicated algorithmns (totals of certain tables have to total zero)
Most of this processing is done in stored procedures.
Although some of the complex processing would be simpler in C#, the extraction of the data into a dataset and its reinjection would slow things down considerably.
You may ask why I do not process the data before inserting it into the database, but I do not think it practical to manipulate 100,000s of records in memory, and the SQLs set based commands help when creating lots of records.
This will probably spark up the age old question of using stored procedures and their pros and cons.
(eg. How do you unit test stored procedures?)
What I would like in response, is your experience with large volumes of data and how you tackled the problem.