views:

45

answers:

2

For my own edification, I decided to test the comparative speeds of DataTable.ImportRow vs DataTable.Merge. I found that DataTable.ImportRow was largely slower than DataTable.Merge. On rare occasion, the two functions had an equal processing time. On even rarer occasions, ImportRow was faster than Merge.

Below are my testing results and code.

  1. Why is ImportRow slower than Merge?
  2. What makes Merge faster?

alt text

    DataTable dt = new DataTable();

    dt.Columns.Add("customerId", typeof(int));
    dt.Columns.Add("username", typeof(string));

    for (int i = 0; i <= 100000; i++)
    {
        DataRow myNewRow;
        myNewRow = dt.NewRow();
        myNewRow["customerId"] = 1;
        myNewRow["username"] = "johndoe";
        dt.Rows.Add(myNewRow);
    }

    // First Duration
    DateTime startTime1 = DateTime.Now;

    DataTable dt2 = new DataTable();
    dt2 = dt.Clone();

    for (int i = 0; i < dt.Rows.Count; i++)
        dt2.ImportRow(dt.Rows[i]);

    DateTime stopTime1 = DateTime.Now;
    // End First Duration

    TimeSpan duration1 = stopTime1 - startTime1;

    // Second Duration
    DateTime startTime2 = DateTime.Now;

    DataTable dt3 = new DataTable();
    dt3 = dt.Clone();
    dt3.Merge(dt);

    DateTime stopTime2 = DateTime.Now;
    // End Second Duration

    TimeSpan duration2 = stopTime2 - startTime2;

Edit: Updated code as per suggestions -

    DataTable dt = new DataTable();

    dt.Columns.Add("customerId", typeof(int));
    dt.Columns.Add("username", typeof(string));

    DataColumn[] key = new DataColumn[1];

    key[0] = dt.Columns[0];

    dt.PrimaryKey = key;

    for (int i = 0; i <= 100000; i++)
    {
        DataRow myNewRow;
        myNewRow = dt.NewRow();
        myNewRow["customerId"] = i;
        myNewRow["username"] = "johndoe";
        dt.Rows.Add(myNewRow);
    }

    // First Duration
    //DateTime startTime1 = DateTime.Now;

    Stopwatch sw1 = new Stopwatch();
    sw1.Start();

    DataTable dt2 = new DataTable();
    dt2 = dt.Clone();

    for (int i = 0; i < dt.Rows.Count; i++)
        dt2.ImportRow(dt.Rows[i]);

    //DateTime stopTime1 = DateTime.Now;
    sw1.Stop();
    // End First Duration

    TimeSpan duration1 = sw1.Elapsed;

    // Second Duration
    //DateTime startTime2 = DateTime.Now;
    Stopwatch sw2 = new Stopwatch();

    sw2.Start();

    DataTable dt3 = new DataTable();
    dt3 = dt.Clone();
    dt3.Merge(dt);

    sw2.Stop();
    //DateTime stopTime2 = DateTime.Now;
    // End Second Duration

    TimeSpan duration2 = sw2.Elapsed;

    label3.Text = duration1.Milliseconds.ToString();
    label4.Text = duration2.Milliseconds.ToString();

alt text

+1  A: 

First of all before you make any specific results here i would use a "StopWatch" to do the timings and not DateTime.Now. StopWatch is a much more precise measurement tool and will get more consistent results.

Otherwise, it would make sense logically that merge could have optimizations for addition as it is designed to import many rows at once.

Mitchel Sellers
+2  A: 
  1. Your measured differences are quite small, especially since you have a resolution of only 20ms (DateTime). Use a StopWatch.

  2. You are setting Id=1 on all records, so it looks like you don't have a proper primary key. That makes this very unrepresentative.

  3. Merge should be faster as that is the one that could be optimized for bulk actions. Given that, I find the results even more equal.

Henk Holterman
I modified the code to use a StopWatch and Primary Key. Merge is significantly faster now.
0A0D