views:

3585

answers:

3

Within SQL Server Integration Services (SSIS) there is the ability to setup a connection to a flat file that can hold millions of records and have that data pushed to a SQL DB. Furthermore, this process can be called from a C# app by referencing and using the Microsoft.SqlServer.Dts.Runtime namespace.

Would a flat file with millions of records best be ran with SSIS, or would the collective "you" prefer a c# app with multiple worker threads(one to read and add the row to variable, one to write from that variable to the DB), and a "mother" class that manages those threads? (the dev box has two cpu's)

I have seen this data (sql team blog) stating that for a flat file with a million lines, SSIS is the fastest:

  1. SSIS - FastParse ON = 7322 ms
  2. SSIS - FastParse OFF = 8387 ms
  3. Bulk Insert = 10534 ms
  4. OpenRowset = 10687 ms
  5. BCP = 14922 ms

What are your thoughts?

A: 

SSIS is incredibly fast. In addition, if it's something that needs to occur repeatedly, you can setup an agent to fire it off on schedule. Writing it yourself is one thing, trying to make it multithreaded gets a lot more complicated than it appears at first.

I'd recommend SSIS 9 times out of ten.

Mike Brown
I appreciate the response Mike, and I will definitely take it to heart when I look into this further. This will also be a repeatable mechanism as well, thank you once again.
RyanKeeter
+4  A: 

I can only speak for myself and my experience. I would go with SSIS, since this is one of those cases where you might be re-inventing the wheel unnecessarily. This is a repetitive task that has already been solved by SSIS.

I have about 57 jobs (combination of DTS and SSIS) that I manage on a daily basis. Four of those routinely handle exporting between 5 to 100 million records. The database I manage has about 2 billion rows. I made use of a script task to append the date, down to the millisecond, so that I can run jobs several times a day. Been doing that for about 22 months now. It's been great!

SSIS jobs can also be scheduled. So you can set it and forget it. I do monitor everything every day, but the file handling part has never broken down.

The only time I had to resort to a custom C# program, was when I needed to split the very large files into smaller chunks. SSIS is dog slow for that sort of stuff. A one gig text file took about one hour to split, using the script task. The C# custom program handled that in 12 minutes.

In the end, just use what you feel comfortable using.

hectorsosajr
You have just created a business case for me to take to my PM about this project. Did you inherit those packages, or did you create them?
RyanKeeter
I created everything from scratch. It is a giant integration project with a third party vendor email marketing system. Feel free to contact me if you have more questions about this.
hectorsosajr
A: 

I can't see how using multiple threads would help performance in this case. When transferring large volumes of data, the main bottleneck is usually disk I/O. Spawning multiple threads wouldn't solve this issue, and my guess would be that it would make things worse since it would introduce locking contention between the multiple processes hitting the database.

polara