ansaurus

Question

Answer 1

+13 A:

You perform each insert inside its own transaction.

Beginning and committing transaction is very expensive in SQL Server.

Enclose everything into a single transaction block:

declare @i int
set @i = 0
set nocount on
BEGIN TRANSACTION
while @i < 2000
begin
insert into testdb(testcolumn)
values (1)
set @i = @i + 1
end
COMMIT

To generate sample data, you can use a recursive CTE:

WITH    q (num) AS
        (
        SELECT  1
        UNION ALL
        SELECT  num + 1
        FROM    q
        WHERE   num < 2000
        )
INSERT
INTO    testdb(testcolumn)
SELECT  1
FROM    q
OPTION (MAXRECURSION 0)

, which will be faster.

Quassnoi 2009-11-16 17:13:28

I believe that this command is just wrapping 1000 implicit transactions in an explicit transaction. If anyone has tested this though, I would defer to them.

marr75 2009-11-16 17:17:18

I didn't realise how much of a difference this made until I tried it myself - 200,000 rows added using the poster's code wrapped in a transaction in 1.3 seconds. Without the transaction, it took 47 seconds.

CodeByMoonlight 2009-11-16 17:19:14

Alsmot each article in my blog tagged `sqlserver` contains a script to populate the test tables. I usually use repeatable `RAND` approach to generate the test data, so the scripts use this approach and are very fast.

Quassnoi 2009-11-16 17:19:55

awesome, thanks

Peter 2009-11-16 17:51:00

Very good. I'd use the single insert approach so would never have thought about looping being better inside one transaction.

gbn 2009-11-16 19:23:27

Answer 2

A:

Having a clustered index (usually primary key) actually increases insert speed, so verify you have one of those. And running 1000 transactions against a table isn't the fastest way if you can have all of the data at once and insert it into the table (This can be accomplished by using table valued parameters in sql server 2008 or xml parameters in 2005).

marr75 2009-11-16 17:14:42

Answer 3

+2 A:

In addition to indices, if you're actual scenario is as per your example, you could do a set-based approach to insert 2000 records like this:

INSERT testdb(testcolumn)
SELECT 1
FROM master..spt_values
WHERE number BETWEEN 1 AND 2000

AdaTheDev 2009-11-16 17:14:48

I'd change `1` to a `ROW_NUMBER`, though :)

Quassnoi 2009-11-16 17:22:00

That's relying on spt_values having more than 2000 values, though. True for SQL2005/2008, but I don't like this method. My spt_values has 2193, barely enough.

CodeByMoonlight 2009-11-16 17:27:11

And as you get higher numbers, you can have gaps, so agreed that you should use row_number. A more reliable table is sys.columns if you have a substantial enough schema.

Aaron Bertrand 2009-11-16 17:40:15

@Quassnoi - I was just going on opening post which just added 1's.@CodeByMoonlight - I was just demonstrating with the exact scenario posted in the question, for which it does have enough values. If more are needed, you'd just cross join to itself to get more. The main point I'm making here, is to try a set-based approach!! I'd choose a set-based approach any day of the week wherever feasible.

AdaTheDev 2009-11-17 08:15:39

Answer 4

A:

I would google to "SQL Server Tuning"... There are many books written on the subject. It is a very hard thing to solve as there are MANY things that affect speed, from query syntax, to RAM allocated to the server, to proportions of allocated RAM (to which part of SQL Server you allocate RAM), to RAID array configuration, and MANY other factors. You can have a database server optimized for insert/updates (OLTP) or for querying (data warehouse type of stuff). In other words, don't expect a single, simple answer to this, even thought your problem seems straightforward.

This is why you have database server administrators.

Or you could just not sweat the server-side issues and optimize your client-code as much as possible, if timing is not very important to you.

I would look into prepared statements and transactions as a way to begin to optimize. Then look at indexing (if this is a set of inserts that do not happen very often I would consider dropping indices, doing the import, and creating the indices again).

gmagana 2009-11-16 17:16:59

Answer 5

+6 A:

1) Log Flush on commit. Every transaction has to ensure the log is flushed to the disk before the commit returns. Every INSERT statement is an implicit transaction. Bulk commit:

declare @i int
set @i = 0
set nocount on
begin transaction
while @i < 2000
begin
  insert into testdb(testcolumn)
  values (1)
  set @i = @i + 1
  if (@i % 1000 = 0)
  begin
   commit;
   begin transaction;
  end
end
commit

2) Slow disk. Check the Avg. Disk sec/Transfer performance counter for your data and your log disks.
3) To many indices (unlikely on a test table). Each index is nearly as expensive as a 'table' for inserts.
4) Triggers (again, unlikely)

Ultimately, measure. Follow the guidelines of a whitepaper like Troubleshooting Performance Problems in SQL Server 2005 if you don't know where to start.

Remus Rusanu 2009-11-16 17:19:37

You explanation is good, and I'd upvote, but I'm confused by your use of `commit; begin transaction;` in your loop. Your code example doesn't appear to be showing a bulk commit, but rather an explicit commit on reach insert. Or am I missing something?

Daniel Pryden 2009-11-16 17:54:39

It commits every 1000 inserts (@i modulo 1000 = 0 -> every 1000 inserts do a commit). I wanted to show that is not necessarily required to commit everything in one single transaction, there could be millions of inserts. The important thing is to fill up the log page before flush (commit).

Remus Rusanu 2009-11-16 18:06:26

Good point on batched commits

Quassnoi 2009-11-16 18:29:53

Answer 6

+2 A:

You have plenty of tools/techniques to get more performance out of this type of work load.

If appropriate Bulk Load anything you can. Somethings you can't. Need to run validated against the records, destination table has nullable columns...
Consider moving complex Data Warehousing/ETL operations to a staging database with no transaction logging (aka simple mode). This will improved performance greatly. Then batch/bulk the data to the destination system.
Batch non-bulk load insert operations. Commit every n records start with 1,000 and performance tune from there.
Improve the speed of your disk storage. Smaller faster disk are much better than bigger and slower. The last db performance tuning project I worked on we moved from local disk 10,000 RPM to SAN then back to solid state disk on the server for some operations. Solid State most definitely rocks! But is expensive.
Use the force, um performance tuning tools for Sql Server to find less obvious bottle necks. Sometimes the best course of action might be to drop and rebuilt indexes based on what % of records are being inserted/deleted compared to the table size; disable triggers during certain operations; and modifying the sparseness of records in data blocks.

Chad 2009-11-16 17:37:37

Answer 7

A:

Insert speed is driven by the following things:

The speed of your log disk. In particular, it's important that the log be on a volume by itself, so that disk seeks don't slow things down (can be a 40x effect)
The structure of your table and associated indexes / keys / triggers, etc.
The size of your transactions. Larger transactions require fewer round-trips to the log disk, and less associated overhead.
The size of your command batches. Larger batches are more efficient than many individual ones.

In case it's of any interest, I go through this in detail in my book (Ultra-Fast ASP.NET), including benchmarks and example code.

RickNZ 2009-11-17 09:32:05

ansaurus

tags:

views:

answers:

Why are SQL server inserts so slow?

related questions