views:

4361

answers:

5

Hi,

I am currently trying to import a semi-colon delimited text file into a database in c# using OleDb where I don't know the type (SQL Server, Access, Oracle, MySQL, postgreSQL, etc.) Currently I'm reading in the file as a database using the Jet text reader then creating a prepared insert statement, populating the fields, then commiting at the end. While that works, it's slow and for millions of rows, it takes way too long.

So my question: Does anybody have any other thoughts on how to best import a text file to a generic database, or comments on my approaches that will lead to a faster import?

I cannot use 3rd party libraries or software to do this as it is part of a larger project

A: 

Your best bet is to buy an off the shelf application for doing this.

Professional Off The Shelf applications use native drivers and fine tune for each type of datasource they will hit against. This is always under the covers so you don't see how they go about it. For example, bulkcopy is used against SQL Server; Oracle has a Data Pump.

The problem with rolling your own is that you can either spend the money to fine tune your application to work with each of the source types you're likely to encounter OR you take a huge performance hit by using the generic ODBC / ADO / Whatever drivers.

At the end of the day, you are better off either leaving this out of your product or just dealing with the inevitable slow approach that you are forced to take. In this case that means using single insert statements for everything.

So, how much money do you have for development resources?

Chris Lively
Unfortunately this is a potion of a much bigger program, so that is not an option
Fry
See my comment to Irongoofy, I actually am doing that but this is a catch-all and I'm trying to make it faster
Fry
+1  A: 

Try this

http://filehelpers.sourceforge.net

....why would you want to load the db into the dataset? Have another database keep track of the uniqueness (if there is such a word). While importing, check if exists in the logging database, if no, then load to Generic Database.

Wait for some other responses to this thread, we may get a better idea.

Saif Khan
I want to load both into a database so they are both the same format, the ultimate goal would be to load the text into a dataset, copy it into the db dataset then insert that data into the database.
Fry
+1  A: 

Not exactly elegant, but performance may be better:

  • load the whole file into a table with just one column "Line" as long text (similar to what you do now locally
  • use stored procedures to split the fields apart and create the inserts
  • execute the inserts on the server

While you are still inserting each line seperately, you wouldn't create quite as much network traffic.

To elaborate, the original method generates the statements on the client and then executes them on the client, resulting in network traffic for each line. My suggestion would be to generate the statements on the server (in a stored procedure) and have them execute on the server, resulting in no new network traffic.

The "correct" solution would be to use a database specific import tool (like SQL Loader for Oracle). The performance gains are enormous. (We are loading huge tables with 20 million lines in about 5 minutes). But of course, that is not very generic.

IronGoofy
True, that would be the "correct" solution, and that is indeed what I'm doing, but I need a catch-all that works with all databases in case an error occurs or it is not supported.It sounds like your uggestion is approx. what I'm doing right now, can you point out the differences?
Fry
updated the answer - hope this makes things a bit clearer
IronGoofy
Prepared statements and stored queries are executed the same in this context. Although the statement is created in the client, as soon as Prepare() is executed, it is treated as a stored query on the server
Fry
+1  A: 

Well, I managed to get the rows of the text file into the database dataset, and so far this method seems to be quicker. I used

Dataset.Tables[x].ImportRow(DataRow)

Of course now it's just getting the DataAdapter.Update(Dataset) to work. Looking online that is gonna be fun...

Update

This method does not yield quicker results as the DataAdapter.Update command does do line by line insertions.

Fry
A: 

BULK INSERT dbo.ImportTest FROM 'C:\ImportData.txt' WITH ( FIELDTERMINATOR =',', FIRSTROW = 2 )

That looks specific to SQL Server
Fry