views:

259

answers:

1

I am loading a bunch of rows into MySql in C#. In MS Sql I can feed a DataReader to SqlBulkCopy, but the MySqlBulkCopy only presents itself as a bootstrap for a bulk load from file.

So, my current solution is using a prepared command in a transacted loop.

Is there a faster way to accomplish bulk loading of MySql using a DataReader source?

Here is the code.

public override void WriteToServer(IDataReader reader)
  {
      const string insertFormat = "insert into `{3}`.`{0}` ({1}) values ({2});";

      string names = string.Join(",",
                                 _command.Parameters.Cast<MySqlParameter>().Select(p => p.ParameterName).ToArray());
      string vals = string.Join(",",
                                _command.Parameters.Cast<MySqlParameter>().Select(p => "?" + p.ParameterName).
                                    ToArray());

      _command.CommandText = string.Format(insertFormat, _table, names, vals, _schema);

      int reportCounter = 0;
      int totalRecords = 0;
      bool finished = false;

      using (var connection = new MySqlConnection(_source))
      {
          connection.Open();
          _command.Connection = connection;
          _command.Prepare();

          while (!finished)
          {
              using (MySqlTransaction dbTrans = connection.BeginTransaction(IsolationLevel.ReadUncommitted))
              {
                  for (int i = 0; i < BatchSize; i++)
                  {
                      if (!reader.Read())
                      {
                          finished = true;
                          break;
                      }

                      try
                      {
                          for (int p = 0; p < _command.Parameters.Count; p++)
                          {
                              _command.Parameters[p].Value = reader.GetValue(p);
                          }
                          _command.ExecuteNonQuery();
                      }
                      catch (Exception ex)
                      {
                          Trace.WriteLine(ex.Message);
                      }
                      reportCounter++;
                      totalRecords++;

                      if (reportCounter >= NotifyAfter)
                      {
                          reportCounter = 0;
                          OnSqlRowsCopied(new SqlRowsCopiedEventArgs(totalRecords));
                      }
                  }
                  dbTrans.Commit();
              }
          }
      }
  }
A: 

Aside from using "LOAD DATA IN FILE" MySQL has a non-SQL Standard 'bulk insert' mechanism where you can specify multiple 'values' to be inserted: http://dev.mysql.com/doc/refman/5.0/en/insert.html

INSERT INTO TABLE x (a,b,c,e,d,f,g,...)
VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ... )  
     , ( ?, ?, ?, ?, ?, ?, ?, ?, ... )  
     , ( ?, ?, ?, ?, ?, ?, ?, ?, ... )  
     , ( ?, ?, ?, ?, ?, ?, ?, ?, ... )  
     , ( ?, ?, ?, ?, ?, ?, ?, ?, ... )  

This example will insert a 'block' of 5 rows. You can use 'prepared' statements for improved performance over sql generated per-row. The down-side to this is when after you've loaded millions of records, you may have only 3 rows remaining to insert. You will need to re-Prepare your SQL with a 3-row insert. Do not be tempted to use NULLs for the missing 4th and 5th record unless you are using 'INSERT IGNORE', but that is slower than a typical insert. The re-prepare is very fast and is worth the results.

We have a table where the a insert block size is 200+ rows! The maximum quantity of rows per insert is based on the size of memory your Operating System deems as the toggle point between mmap() and malloc(). For Solaris 10 we are using "4096 / rows_size = rows_per_insert". There is a mysql bug somewhere about this issue which is loosely related to read_buffer_size.

J Jorgenson
Agreed, and I gave that a shot and ran into a packet size issue. I am creating the db via Command and haven't found a DDL statement to set max packet size. So, leveraging your experience I need to somehow programmatically determine 'toggle point between mmap() and malloc()', set my packet size just under, and prepare the main insert command accordingly. Sound right? Another issue that may reduce the efficacy is that the source data has large text fields.
Sky Sanders