views:

1203

answers:

9

I am about to start on a journey writing a windows forms application that will open a txt file that is pipe delimited and about 230 mb in size. This app will then insert this data into a sql server 2005 database (obviously this needs to happen swiftly). I am using c# 3.0 and .net 3.5 for this project.

I am not asking for the app, just some communal advise here and potential pitfalls advise. From the site I have gathered that SQL bulk copy is a prerequisite, is there anything I should think about (I think that just opening the txt file with a forms app will be a large endeavor; maybe break it into blob data?).

Thank you, and I will edit the question for clarity if anyone needs it.

+1  A: 

This is going to be a streaming endeavor.

If you can, do not use transactions here. The transactional cost will simply be too great.

So what you're going to do is read the file a line at a time and insert it in a line at a time. You should dump failed inserts into another file that you can diagnose later and see where they failed.

At first I would go ahead and try a bulk insert of a couple of hundred rows just to see that the streaming is working properly and then you can open up all you want.

Orion Adrian
+14  A: 

Do you have to write a winforms app? It might be much easier and faster to use SSIS. There are some built-in tasks available especially Bulk Insert task.

Also, worth checking Flat File Bulk Import methods speed comparison in SQL Server 2005.

Update: If you are new to SSIS, check out some of these sites to get you on fast track. 1) SSIS Control Flow Basics 2) Getting Started with SQL Server Integration Services

This is another How to: on importing Excel file into SQL 2005.

Gulzar
Alot easier and much quicker in SSIS... agreed.
EvilSyn
I find SSIS to be a huge pain in the keester. It craps out for odd reaons more often and requires DBA access to the database server to troubleshoot/fix/re-run (which is restricted in our production environment).
Ron Savage
I agree it takes a little bit of mastery especially in troubleshooting and deployment.
Gulzar
I really like this idea Gulzar, do you have any additional super links that would help in this project? The two you provided are fantastic.
RyanKeeter
@Ron: I agree 100% Having used DTS and knowing it thoroughly, I cant stand SSIS and gave up on it for a large project such as the same as this question asks.. Millions of rows inserting from flat files which I have no control over the creation with anomalies that I cant correct. SSIS just fails.
Optimal Solutions
+1  A: 

You could try using SqlBulkCopy. It lets you pull from "any data source".

Kyralessa
A: 

If the column format of the file matches the target table where the data needs to end up, I prefer using the command line utility bcp to load the data file. It's blazingly fast and you can specify and error file for any "odd" records that fail to be inserted.

Your app could kick off the command if you need to store the command line parameters for it (server, database, username / password or trusted connection, table, error file etc.).

I like this method better than running a BULK INSERT SQL command because the data file isn't required to be on a system accessible by the database server. To use bulk insert you have to specify the path to the data file to load, so it must be a path visible and readable by the system user on the database server that is running the load. Too much hassle for me usually. :-)

Ron

Ron Savage
+1  A: 

Just as a side note, it's sometimes faster to drop the indices of your table and recreate them after the bulk insert operation.

vIceBerg
A: 

The size of data you're talking about actually isn't that gigantic. I don't know what your efficiency concerns are, but if you can wait a few hours for it to insert, you might be surprised at how easy this would be to accomplish with a really naive technique of just INSERTing each row one at a time. Batching together a thousand or so rows at a time and submitting them to SQL server may make it quite a bit faster as well.

Just a suggestion that could save you some serious programming time, if you don't need it to be as fast as conceivable. Depending on how often this import has to run, saving a few days of programming time could easily be worth it in exchange for waiting a few hours while it runs.

A: 

You could use SSIS for the read & insert, but call it as a package from your WinForms app. Then you could pass in things like source, destination, connection strings etc as parameter/configurations.

HowTo: http://msdn.microsoft.com/en-us/library/aa337077.aspx

You can set up transforms and error handling inside SSIS and even create logical branching based on input parameters.

Meff
+1  A: 

You might consider switching from full recovery to bulk-logged. This will help to keep your backups a reasonable size.

Dave DuPlantis
+1  A: 

I totally recommend SSIS, you can read in millions of records and clean them up along the way in relatively little time.

You will need to set aside some time to get to grips with SSIS, but it should pay off. There are a few other threads here on SO which will probably be useful:

http://stackoverflow.com/questions/24200/whats-the-fastest-way-to-bulk-insert-a-lot-of-data-in-sql-server-c-client

http://stackoverflow.com/questions/142015/ssis-gurus-links-to-great-learning-content-requested

You can also create a package from C#. I have a C# program which reads a 3GL "master file" from a legacy system (parses into an object model using an API I have for a related project), takes a package template and modifies it to generate a package for the ETL.

Cade Roux