tags:

views:

369

answers:

5

Is there a way to speed up inserts to a mdb?

 using (StreamReader sr = new StreamReader(_localDir + "\\" + _filename))
  while ((line = sr.ReadLine()) != null)
{
   //sanitize the data
}

This takes about 20sec for ~2mil records from a csv but when I add in the mdb insert I can barely get 10,000 records in 10min, so you can see it'll take forever

 using (StreamReader sr = new StreamReader(_localDir + "\\" + _filename))
 while ((line = sr.ReadLine()) != null)
{
//sanitize the data
using (OleDbConnection con = new OleDbConnection(_conStr))
 using (OleDbCommand cmd = new OleDbCommand())
 cmd.Parameters.AddWithValue...//I have 22 params
cmd.ExecuteNonQuery();

}

Is there a better way? Connection pooling? threading? Here is my constr Provider=Microsoft.Jet.OLEDB.4.0;Data Source=mypath;Jet OLEDB:Engine Type=5"

Regards

_Eric

+2  A: 

You'd probably realize some performance benefits by moving the loop inside of the using blocks. Create 1 connection/command and execute it N times instead of creating N connections/commands.

Ken Browning
I give that a shot now and post the results, thanks
Eric
using (StreamReader sr = new StreamReader(_localDir + "\\" + _filename))using (OleDbConnection con = new OleDbConnection(_conStr)) using (OleDbCommand cmd = new OleDbCommand()) while ((line = sr.ReadLine()) != null)didn't really help, it did 275 records in 1 min, but thanks
Eric
A: 

Another change that might speed it up a little more is to prepare the command one time and create all the parameters. Then in the loop, just assign the parameter values and execute it each time. That may avoid the parsing and semantic checking of the statement each iteration and should improve the time some. However, I don't think it would be a significant improvement. The statement parsing should be a relatively small portion of the total cost even if it is parsed every time.

Mark Wilkins
Thanks, I think thats how I currently am trying itafter the using (OleDbCommand cmd = new OleDbCommand())I set the cmd to the con and open it also set the cmd.command text to INSERT INTO MyTable VALUES(@p1,@p2...then b4 I execute the command in the while ((line = sr.ReadLine())i do a cmd.params.clear and resign for the current row. I can even loop with everything but the executenonquery and its pretty peppy. Without the ExecuteNonquery I can loop the 2m+ records in 32514 ms, when I add the executeNonQuery I get ~ 80000 per min
Eric
@Eric: Is the .mdb file on a local drive or on a network drive?
Mark Wilkins
@Mark, it is local. I have given up this approach as I can read->sanitize->write out the csv in ~40sec then just call Process.Start("msaccess.exe",myNewFile) and all done in less then a minute
Eric
@Eric: Cool. That works. I'm glad you found a fast solution.
Mark Wilkins
+2  A: 

Is it possible for you to use a query that inserts directly from csv? For example:

SELECT ID,Field1 INTO NewTable 
FROM [Text;HDR=YES;FMT=Delimited;IMEX=2;DATABASE=C:\Docs\].Some.CSV

You can use something similar with non-standard delimiters, but you will need a Schema.ini file in the same directory as the file to be imported. It need only contain:

[tempImportfile.csv]
TextDelimiter='

You will have to alter the connect string slightly, this seems to work:

Text;HDR=YES;FMT=Delimited;DATABASE=C:\Docs\
Remou
I'll give this a shot, but do you know the actual connection string to set the delimiter to comma and single quote for text qualifier.http://www.connectionstrings.com/textfile is not to clearThanks
Eric
FMT=Delimited should do that.
Remou
The above is a query that should run from the Access connection.
Remou
When I try Select * into MyTable FROM [Text;HDR=YES;FMT=Delimited;IMEX=2;DATABASE=C:\\temp\\tempImportfile.csv]I always get ex.Message "The Microsoft Jet database engine cannot find the input table or query 'Text;HDR=YES;FMT=Delimited;IMEX=2;DATABASE=C:\\temp\\tempImportfile.csv'. Make sure it exists and that its name is spelled correctly." It is there. Thanks
Eric
You seem to have the square bracket in the wrong place: [Text;HDR=YES;FMT=Delimited;IMEX=2;DATABASE=C:\temp\].tempImportfile.csv the connect string is bewteen the [] as the DB and the file outside as the 'table'
Remou
Thanks, that worked but it is not removing the text qualifiers, is there a way to tell it to do that? i.e. remove the singlequote around the fields
Eric
Accepted, thanks. This worked great! took 64708ms. Again my only issue is how to specify in the cmd.commandText to strip the text qualifier 'myField' should just be myField
Eric
Theoretically this should work: http://support.microsoft.com/kb/205439 but I am having difficulties with it at the moment. Single quote is a non-standard delimiter, and I am afraid I missed that note in your comment. It would also be possible to use left, right and mid functions to strip the text delimiter, but try the schema.ini
Remou
Single quote is not the delimiter its the text qualifierheading1,Heading2'h1value','h2value'but I see in that article they call it the delimiter also, confused!!
Eric
Okay, I got schema.ini working, I will post an edit above.
Remou
I should have said text delimiter or text qualifier.
Remou
Yup, that did it, thanks
Eric
A: 

Microsoft Jet to handle Sql parsing (INSERT/UPDATE) is slow in general. In other words, you may have the most efficient code possible, but the choke point is Jet. Keep in mind, that in your original posting your connecting (open file, create lock, seek file, insert row, dispose lock, close file, dispose object) for every line. You need to connect ONCE (outside the while), read the lines, generate Sql (OleDbCommand), and then execute.

Michael
This is simply not true. Jet is usually very FAST for this kind of thing if you tell it to do the right thing. The code in the original question is doing it one row at a time, so far as I understand it, and that's going to be vastly slower than doing the insert in a batch (which is the point of comparison). Any database engine is going to be slower executing 2 million one-row inserts than it will be inserting 2 million rows in one batch.
David-W-Fenton
I'd like to see your benchmarks compared with mine. I've found that going via Interop is faster when talking to Excel or Access, bypassing Jet altogether. If I can see how Jet is more efficient, then I'll eat my down-vote handily.With respect to batch processing or no batch processing, that was not was I was addressing in my point. Please note my additional command on his approach of opening/closing (rinse, repeat) to the same mdb file for each line and how that is not efficient whatsoever.
Michael
If you're talking to Access/Jet data, you cannot possibly be bypassing Jet! One can use inefficient methods to talk to Jet and get bad performance, which is what was happening in the original question (i.e., a succession of one-row inserts, instead of a batch insert).
David-W-Fenton
A: 

[tempImportfile.csv] TextDelimiter=,

that doesn't work for me. I have , separated file with values in " " and OleDb can't parse it. Do You have any Ideas why?

3P