views:

703

answers:

7

I'm trying to insert rows of in memory data into a table on sql express 2005. It is running what seems to me very slowly - about 5 seconds per 1000 rows inserted. I am just using a basic "INSERT INTO" command. The slowness does not depend on the table data - it is still slow with a table with one int column and no index. It is nothing to do with my software - it is just as slow running sql in a loop from management studio. There is nothing else accessing the database at the same time. On a 3Ghz Xeon (old I know), this will take about 10 seconds to execute:

declare @i int
set @i = 0
set nocount on
while @i < 2000
begin
insert into testdb(testcolumn)
values (1)
set @i = @i + 1
end

My question is, is there a better way to insert bulk in memory data than looping on INSERT? Or some configuration I should change in sql server ?

+13  A: 

You perform each insert inside its own transaction.

Beginning and committing transaction is very expensive in SQL Server.

Enclose everything into a single transaction block:

declare @i int
set @i = 0
set nocount on
BEGIN TRANSACTION
while @i < 2000
begin
insert into testdb(testcolumn)
values (1)
set @i = @i + 1
end
COMMIT

To generate sample data, you can use a recursive CTE:

WITH    q (num) AS
        (
        SELECT  1
        UNION ALL
        SELECT  num + 1
        FROM    q
        WHERE   num < 2000
        )
INSERT
INTO    testdb(testcolumn)
SELECT  1
FROM    q
OPTION (MAXRECURSION 0)

, which will be faster.

Quassnoi
I believe that this command is just wrapping 1000 implicit transactions in an explicit transaction. If anyone has tested this though, I would defer to them.
marr75
I didn't realise how much of a difference this made until I tried it myself - 200,000 rows added using the poster's code wrapped in a transaction in 1.3 seconds. Without the transaction, it took 47 seconds.
CodeByMoonlight
Alsmot each article in my blog tagged `sqlserver` contains a script to populate the test tables. I usually use repeatable `RAND` approach to generate the test data, so the scripts use this approach and are very fast.
Quassnoi
awesome, thanks
Peter
Very good. I'd use the single insert approach so would never have thought about looping being better inside one transaction.
gbn
A: 

Having a clustered index (usually primary key) actually increases insert speed, so verify you have one of those. And running 1000 transactions against a table isn't the fastest way if you can have all of the data at once and insert it into the table (This can be accomplished by using table valued parameters in sql server 2008 or xml parameters in 2005).

marr75
+2  A: 

In addition to indices, if you're actual scenario is as per your example, you could do a set-based approach to insert 2000 records like this:

INSERT testdb(testcolumn)
SELECT 1
FROM master..spt_values
WHERE number BETWEEN 1 AND 2000
AdaTheDev
I'd change `1` to a `ROW_NUMBER`, though :)
Quassnoi
That's relying on spt_values having more than 2000 values, though. True for SQL2005/2008, but I don't like this method. My spt_values has 2193, barely enough.
CodeByMoonlight
And as you get higher numbers, you can have gaps, so agreed that you should use row_number. A more reliable table is sys.columns if you have a substantial enough schema.
Aaron Bertrand
@Quassnoi - I was just going on opening post which just added 1's.@CodeByMoonlight - I was just demonstrating with the exact scenario posted in the question, for which it does have enough values. If more are needed, you'd just cross join to itself to get more. The main point I'm making here, is to try a set-based approach!! I'd choose a set-based approach any day of the week wherever feasible.
AdaTheDev
A: 

I would google to "SQL Server Tuning"... There are many books written on the subject. It is a very hard thing to solve as there are MANY things that affect speed, from query syntax, to RAM allocated to the server, to proportions of allocated RAM (to which part of SQL Server you allocate RAM), to RAID array configuration, and MANY other factors. You can have a database server optimized for insert/updates (OLTP) or for querying (data warehouse type of stuff). In other words, don't expect a single, simple answer to this, even thought your problem seems straightforward.

This is why you have database server administrators.

Or you could just not sweat the server-side issues and optimize your client-code as much as possible, if timing is not very important to you.

I would look into prepared statements and transactions as a way to begin to optimize. Then look at indexing (if this is a set of inserts that do not happen very often I would consider dropping indices, doing the import, and creating the indices again).

gmagana
+6  A: 

1) Log Flush on commit. Every transaction has to ensure the log is flushed to the disk before the commit returns. Every INSERT statement is an implicit transaction. Bulk commit:

declare @i int
set @i = 0
set nocount on
begin transaction
while @i < 2000
begin
  insert into testdb(testcolumn)
  values (1)
  set @i = @i + 1
  if (@i % 1000 = 0)
  begin
   commit;
   begin transaction;
  end
end
commit

2) Slow disk. Check the Avg. Disk sec/Transfer performance counter for your data and your log disks.
3) To many indices (unlikely on a test table). Each index is nearly as expensive as a 'table' for inserts.
4) Triggers (again, unlikely)

Ultimately, measure. Follow the guidelines of a whitepaper like Troubleshooting Performance Problems in SQL Server 2005 if you don't know where to start.

Remus Rusanu
You explanation is good, and I'd upvote, but I'm confused by your use of `commit; begin transaction;` in your loop. Your code example doesn't appear to be showing a bulk commit, but rather an explicit commit on reach insert. Or am I missing something?
Daniel Pryden
It commits every 1000 inserts (@i modulo 1000 = 0 -> every 1000 inserts do a commit). I wanted to show that is not necessarily required to commit everything in one single transaction, there could be millions of inserts. The important thing is to fill up the log page before flush (commit).
Remus Rusanu
Good point on batched commits
Quassnoi
+2  A: 

You have plenty of tools/techniques to get more performance out of this type of work load.

  1. If appropriate Bulk Load anything you can. Somethings you can't. Need to run validated against the records, destination table has nullable columns...
  2. Consider moving complex Data Warehousing/ETL operations to a staging database with no transaction logging (aka simple mode). This will improved performance greatly. Then batch/bulk the data to the destination system.
  3. Batch non-bulk load insert operations. Commit every n records start with 1,000 and performance tune from there.
  4. Improve the speed of your disk storage. Smaller faster disk are much better than bigger and slower. The last db performance tuning project I worked on we moved from local disk 10,000 RPM to SAN then back to solid state disk on the server for some operations. Solid State most definitely rocks! But is expensive.
  5. Use the force, um performance tuning tools for Sql Server to find less obvious bottle necks. Sometimes the best course of action might be to drop and rebuilt indexes based on what % of records are being inserted/deleted compared to the table size; disable triggers during certain operations; and modifying the sparseness of records in data blocks.
Chad
A: 

Insert speed is driven by the following things:

  1. The speed of your log disk. In particular, it's important that the log be on a volume by itself, so that disk seeks don't slow things down (can be a 40x effect)
  2. The structure of your table and associated indexes / keys / triggers, etc.
  3. The size of your transactions. Larger transactions require fewer round-trips to the log disk, and less associated overhead.
  4. The size of your command batches. Larger batches are more efficient than many individual ones.

In case it's of any interest, I go through this in detail in my book (Ultra-Fast ASP.NET), including benchmarks and example code.

RickNZ