views:

561

answers:

4

I am doing some performance tests using .Net 3.5 against SQL Server. I am doing an insert of 1 million records. When I wrap this inside a transaction (either serializable, RepeatabelRead or ReadUncommited) it runs in under 80 seconds on my system. When I remove the transaction it runs in roughly 300 seconds. I would expect that using no transaction would be the fastest way to insert lines into a database, because the DBMS does not need to take into account a potential rollback. What happens here? Is this typical for SQL Server, the SQL Server ADO.Net Provider, ADO.Net in general, DBMSes in general?

I have a background in iSeries/DB2 databases. In DB2 you have to enable journalling before you can get commitment control and transactions, and journalling is relatively expensive.

What I actually wanted to do was a compare of SqlCommand inserts vs Entity Framework inserts, but I was so surprised at these results that I wanted to find out what is going here first.

Below the code that I use to run the a test. When I run the below code, it takes about 74 seconds (measured between the AtStart log and the AtEnd log lines)

using (SqlConnection sqlConnection = new SqlConnection(connectionString))
{
    sqlConnection.Open();
    SqlCommand deleteCommand = new SqlCommand("DELETE FROM LockTest");
    deleteCommand.Connection = sqlConnection;
    deleteCommand.ExecuteNonQuery();

    using (SqlTransaction transaction = sqlConnection.BeginTransaction(System.Data.IsolationLevel.Serializable))
    {
        try
        {
            if (DEBUG) LOG.Debug("AtStart");

            SqlCommand insertCommand = new SqlCommand();
            insertCommand.Connection = sqlConnection;
            insertCommand.Transaction = transaction;

            insertCommand.CommandText = "INSERT INTO LockTest (Id, Name, Description, Type) "  + 
                "VALUES (@id, @name, @description, @type)";
            SqlParameter idParameter = new SqlParameter("@id", System.Data.SqlDbType.UniqueIdentifier);
            insertCommand.Parameters.Add(idParameter);
            SqlParameter nameParameter = new SqlParameter("@name", System.Data.SqlDbType.NVarChar, 50);
            insertCommand.Parameters.Add(nameParameter);
            SqlParameter descriptionParameter = new SqlParameter("@description", System.Data.SqlDbType.NVarChar, Int32.MaxValue);
            insertCommand.Parameters.Add(descriptionParameter);
            SqlParameter typeParameter = new SqlParameter("@type", System.Data.SqlDbType.NChar, 20);
            insertCommand.Parameters.Add(typeParameter);

            insertCommand.Prepare();

            for (int i= 0; i < 1000000; i++)
            {
                Guid g = Guid.NewGuid();
                string s = g.ToString();
                insertCommand.Parameters["@id"].Value = g;
                insertCommand.Parameters["@name"].Value = s;
                insertCommand.Parameters["@description"].Value = DateTime.UtcNow.Ticks.ToString();
                insertCommand.Parameters["@type"].Value = "test";
                insertCommand.ExecuteNonQuery();
            }
            transaction.Commit();
        }
        catch
        {
            transaction.Rollback();
            throw;
        }

    }
    sqlConnection.Close();
}
if (DEBUG) LOG.Debug("AtEnd");
A: 

Because every command (if transaction not setted explicitly) wrapped with transaction implicitly ie you have 1M transactions. At least for sqLite

Sergey Mirvoda
A: 

It has to get and release a lock on each insert if you aren't transactional. With the transaction, it can keep a lock open for multiple inserts. Less overhead.

David M
The number of locks acquired/released is identical in both cases.
Remus Rusanu
Does that not depend on the granularity of the locks that are acquired?
David M
@Remus: probably not. It would escalate the lock to page/table.
gbn
@gbn true but that is a 'special case' and may or may not happen actually (likely will at 1M). The pure 'theoretical' locks needed are identical (key locks for the record inserted). In my experience the cost of log flush is overwhelmingly the factor in performance difference in batching transactions. Lock acquisition time is *extremely* low (low hundred CPU cycles) compared to log page I/O wait time.
Remus Rusanu
@David: No, because the test does individual inserts. Page locks would be considered only if the test would insert multiple records (eg. INSERT ... SELECT ...).
Remus Rusanu
@gbn: Forgot the OP test does one insert per statement, so that excludes also the lock escalation possibility.
Remus Rusanu
@Remus: WIthin the transaction, after say 500k row lock, surely it would escalate...
gbn
@gbn: nope. http://msdn.microsoft.com/en-us/library/ms184286(SQL.90).aspx: <quote> Lock escalation is triggered when a Transact-SQL statement acquires at least 5,000 locks on a single reference of a table or index, or, if the table is partitioned, a single reference of a table partition or index partition.</quote> The locks have to be acquired in single *statement* and a single *partition reference*. +5k insert ... values ... statements will not trigger it. One statement insert ... select ... with +5k inserts will trigger it.
Remus Rusanu
+2  A: 

Log flush.

With no explicit transactions the implicit transactions started by each statement (ie. INSERT) must commit. Commit cannot return until the data in the log is written to the disk, which means each INSERT statement must wait for the log disk write operation.

Explicit transactions must only wait when the COMMIT statement is issued, and by that time every full log page was already submitted, and the last log page contains probably several INSERTs so the cost of the write is amortized.

Update:

You can verify the Log Flush times in the performance counters: http://msdn.microsoft.com/en-us/library/ms189883.aspx:

  • Log Flush Wait Time Total wait time (in milliseconds) to flush the log.
  • Log Flush Waits/sec Number of commits per second waiting for the log flush.
  • Log Flushes/sec Number of log flushes per second.
Remus Rusanu
Thanks for the info so far: How about the second part of my question. Is this typical for SQL Server or do other DBMSes behave the same, e.g. MySQL, Oracle?
Jeroen Huinink
This is typical for all write-ahead log based databases (http://en.wikipedia.org/wiki/Write_ahead_logging). MySQL behaves the same with InnoDB engine and Oracle too (there may be all sort of knobs to control this in Oracle, I'm by no means an Oracle expert). The alternative to WAL is versioned paging (http://en.wikipedia.org/wiki/Shadow_paging) but the only commercial DB I'm aware of deploying that is Informix. Systems that don't employ WAL or versioned pages don't offer ACID so they usually don't offer transactions (eg. MySQLs ISAM engine).
Remus Rusanu