tags:

views:

170

answers:

5

unihere's the scenario:

I have a 1.8 million line text file that I need to inject into a SQL table, the code i have works fine, just kind of slow (like 250k lines per day). Unfortunetly I have about 4 text files this size to go threw, so I need a way to speed up the process. Any help would be appreciated. if some of the code doesn't look right it's cause i omitted some things for privacy. I know i could chop out the file.appendall but i use that to track it and I also do the star++ so I can pick up the next day to not stop backups.

DirectoryInfo dinfo = new DirectoryInfo(ocrdirectory);
FileInfo[] Files = dinfo.GetFiles("*.txt");
foreach (FileInfo filex in Files)
{
        string[] primaryfix = File.ReadAllLines(dinfo + "\\" + filex);
        string filename = filex.ToString();
        string[] spltifilename = filename.Split('.');
        foreach (string primary in primaryfix)
        {
            string sqltable = ("dbo.amu_Textloadingarea");
            string sql = "update " + sqltable + 
                         " set [Text] = [Text] + '" + primary +"|"+ 
                         "' where unique = '" + spltifilename[0] + "'";
            File.AppendAllText(@"C:\convert\sqltest.txt", sql+"\n");
            SqlConnection con = new SqlConnection("Data Source= Cote ;Initial Catalog= eCASE;Integrated Security= SSPI");
            con.Open();
            SqlCommand cmd = new SqlCommand(sql, con);
            SqlDataReader reader = cmd.ExecuteReader();
            con.Close();
            Console.WriteLine(start);
            start++;
        }
+9  A: 

You need to look into BULKCOPY. In particular, look at the SqlBulkCopy class.

http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx

http://msdn.microsoft.com/en-us/library/7ek5da1a%28VS.80%29.aspx

A nice blog entry on using SqlBulkCopy to load a large text file (+1M rows) to a DB

http://blog.stevienova.com/2008/01/16/net-fastest-way-to-load-text-file-to-sql-sqlbulkcopy/

code4life
+7  A: 

I would suggest trying an scheduled SSIS solution or a bcp solution rather than going through code.

Also, from looking at your code, you have structured it for the least amount efficiency. While what you have will work well for applications it is definitely not good for bulk-type operations. Your issues:

  • You're creating db objects with every loop iteration that only need to be created once
  • You're re-connecting to the db every time. This also needs to be done once (per session)
  • You need to break up this method... but, that's still not the best solution

As I and others have mentioned, you really need to work out a special, bulk insert solution.

Paul Sasik
BCP would be better choice.
saurabh
+2  A: 

I would try to use SQL Server SSIS for such a requirement. SSIS has package variables that allow dynamic substitution of values.

Girish
A: 

All of the above. However a few simple speed ups would be to use string builder instead of string concatenation, and keeping the database connection open, these should save you lots of time.

Dan H
string.Join outperforms stringbuilder, unless he wants to take the time to figure out the size of the built string. (To be fair, string.Join will use a stringbuilder, but optimizes it)
marr75
Your right string join is faster and makes better sense.
Dan H
+2  A: 

For starters, only open your connection once (outside your loop). Also, call cmd.ExecuteNonQuery() instead of creating a reader you're just going to throw away. You might also consider creating your SqlCommand once (after you open your connection) and just reassigning the SQL statement to the .CommandText property. If you do that, also make it a prepared statement and just assign your values to the parameters.

TMN