views:

191

answers:

3

I have the following method to insert millions of rows of data into a table (I use SQL 2008) and it seems slow, is there any way to speed up INSERTs?

Here is the code snippet - I use MS enterprise library

        public void InsertHistoricData(List<DataRow> dataRowList)
        {
            string sql = string.Format( @"INSERT INTO [MyTable] ([Date],[Open],[High],[Low],[Close],[Volumn])
                VALUES( @DateVal, @OpenVal, @High, @Low, @CloseVal, @Volumn )");

            DbCommand dbCommand = VictoriaDB.GetSqlStringCommand( sql );
            DB.AddInParameter(dbCommand, "DateVal", DbType.Date);
            DB.AddInParameter(dbCommand, "OpenVal", DbType.Currency);
            DB.AddInParameter(dbCommand, "High", DbType.Currency );
            DB.AddInParameter(dbCommand, "Low", DbType.Currency);
            DB.AddInParameter(dbCommand, "CloseVal", DbType.Currency);
            DB.AddInParameter(dbCommand, "Volumn", DbType.Int32);

            foreach (NasdaqHistoricDataRow dataRow in dataRowList)
            {
                DB.SetParameterValue( dbCommand, "DateVal", dataRow.Date );
                DB.SetParameterValue( dbCommand, "OpenVal", dataRow.Open );
                DB.SetParameterValue( dbCommand, "High", dataRow.High );
                DB.SetParameterValue( dbCommand, "Low", dataRow.Low );
                DB.SetParameterValue( dbCommand, "CloseVal", dataRow.Close );
                DB.SetParameterValue( dbCommand, "Volumn", dataRow.Volumn );

                DB.ExecuteNonQuery( dbCommand );
            }
        }
+5  A: 

Consider using bulk insert instead.

SqlBulkCopy lets you efficiently bulk load a SQL Server table with data from another source. The SqlBulkCopy class can be used to write data only to SQL Server tables. However, the data source is not limited to SQL Server; any data source can be used, as long as the data can be loaded to a DataTable instance or read with a IDataReader instance. For this example the file will contain roughly 1000 records, but this code can handle large amounts of data.

This example first creates a DataTable and fills it with the data. This is kept in memory.

DataTable dt = new DataTable();
string line = null;
int i = 0;

using (StreamReader sr = File.OpenText(@"c:\temp\table1.csv"))
{  
      while ((line = sr.ReadLine()) != null)
      {
            string[] data = line.Split(',');
            if (data.Length > 0)
            {
                  if (i == 0)
                  {
                  foreach (var item in data)
                  {
                        dt.Columns.Add(new DataColumn());
                  }
                  i++;
             }
             DataRow row = dt.NewRow();
             row.ItemArray = data;
             dt.Rows.Add(row);
             }
      }
}

Then we push the DataTable to the server in one go.

using (SqlConnection cn = new SqlConnection(ConfigurationManager.ConnectionStrings["ConsoleApplication3.Properties.Settings.daasConnectionString"].ConnectionString))
{
      cn.Open();
      using (SqlBulkCopy copy = new SqlBulkCopy(cn))
      {
            copy.ColumnMappings.Add(0, 0);
            copy.ColumnMappings.Add(1, 1);
            copy.ColumnMappings.Add(2, 2);
            copy.ColumnMappings.Add(3, 3);
            copy.ColumnMappings.Add(4, 4);
            copy.DestinationTableName = "Censis";
            copy.WriteToServer(dt);
      }
} 
smink
is definitely the best choice here since the question asker already has a sequence of data readers, these can be passed into a sql bulk copy...
Tim Mahy
tried this and it worked, thx
sean717
A: 

Where the data come from? Could you run a bulk insert? If so, that is the best option you could take.

eKek0
+1  A: 

One general tip on any relational database when doing a large number of inserts, or indeed any data change, is to drop the all your secondary indexes first then recreate them afterwards.

Why does this work? Well with secondary indexes the index data will be elsewhere on the disk than the data, so forcing at best an additional read/write update for each record written to the table per index. In fact it may be much worse than this as from time to time the database will decide it needs to carry out a more serious reorganisation operation on the index.

When you recreate the index at the end of the insert run the database will perform just one full table scan to read and process the data. Not only do you end up with a better organised index on disk, but the total amount of work required will be less.

When is this worthwhile doing? That depends upon your database, index structure and other factors (such as if you have your indexes on a separate disk to your data) but my rule of thumb is to consider it if I am processing more than 10% of the records in a table of a million records or more - and then check with test inserts to see if it is worthwhile.

Of course on any particular database there will be specialist bulk insert routines, and you should also look at those.

Cruachan

related questions