views:

469

answers:

4

We have a small c# tool that we threw together to parse a data file, build up some objects and insert them into the DB.

The logic is essentially.

string [] lines = File.ReadAllLines("C:\\Temp\\Data.dat")

foreach(string line in lines)
{
    MyDataObject obj = ParseObject(line);
    myDataContext.MyDataObjects.InsertOnSubmit(obj);
}

myDataContext.SubmitChanges();

This was fine at the outset as the data file was only ~1000 lines long each day but recently this file has grown to ~30,000 lines and the process has become painfully slow.

Everything up the SubmitChanges() call is fine, but once it starts the process of dumping 30,000 inserts to the DB, it just grinds to halt. As a test I dummied up 30,000 insert statements and ran them directly from QA. It took roughly 8 minutes.

After 8 minutes, the C# / Linq version had only completed about 25% of the inserts.

Anyone got any suggestions how I might go about optimizing this ?

+1  A: 

I had the same question some time ago. I was inserting 1000000 new entries in a db and I found tat calling SubmitChanges every 500 was the fastest way.

I can't assure that 500 row at the time is the fastest ever, our environment is quite weird...

Davide Vosti
I suspect it has to do with the maximum number of parameters a single SQL command can contain. I believe the maximum is 2100 parameters. If you were inserting records with 4 columns, an optimum of about 500 would make sense.
Thorarin
In fact, my query had 4 parameters! Thanks for the hint!!!
Davide Vosti
+5  A: 

If you are writing a large volume of homogeneous data, SqlBulkCopy may be a more appropriate tool, for example perhaps with CsvReader to read lines (since SqlBulkCopy can accept an IDataReader, which means you don't have to buffer all 30k lines into memory).

If the data is CSV, this can be as simple as:

using (CsvReader reader = new CsvReader(path))
using (SqlBulkCopy bcp = new SqlBulkCopy(CONNECTION_STRING))
{
    bcp.DestinationTableName = "SomeTable";
    bcp.WriteToServer(reader);
}

If the data is more complex (not-CSV), then SimpleDataReader might be useful - you just subclass it and add the code to represent your data per row.

Marc Gravell
Cheers Marc, have gone down this route, building up a DataTable in memory and passing that to the BulkCopy object. it's not flat CSV unfortunately and needs a lot of manipulatiion before shoving it on to the server. Thanks
Eoin Campbell
+1  A: 

You might want to try a multi-threaded approach.

  1. Partition the record set into smaller sizes (1000 each?), put them into a Stack
  2. Have a class that will grab the record set from the top of the Stack and start to insert it using a multi-threaded class that opens up a DataContext and inserts on its own.
  3. While it is inserting, a second class is opened for the next set of records
  4. Internal logic dictates how many inserts can be run at once (5? 10?)

This may allow the inserts to proceed faster than merely running SubmitChanges() every few records, as multiple inserts can happen at the same time.

Yaakov Ellis
+1  A: 

This is a database task and should be done through SSIS and using a bulk insert.

I can insert 30,000 records in seconds or milliseconds (depending on the number of columns and how complex the data mapping is). I have imports with over a million records that insert in less time than you are spending looping through records one at a time. I even have one 20 million record file that only takes 16 minutes.

HLGEM
Yeah that's great. However it's not feasible. For starters it's on SQL 2000 so no ssis, this is DTS territory. Secondly, the data gets submitted from an affiliate to our business dept. and I'm quite content to give them a tool that takes a 2 minutes to run to upload the data, instead of giving them full access to the DB so they can muck around with enterprise manager.
Eoin Campbell
YOu can do this with DTS. Then have your app call the DTS package. Or do it in a stored proc using bulk insert and have your app call the proc.
HLGEM