tags:

views:

269

answers:

6

I am testing the speed of inserting multiple rows with a single INSERT statement.

For example: INSERT INTO [MyTable] VALUES (5, 'dog'), (6, 'cat'), (3, 'fish)

This is very fast until I pass 50 rows on a single statement, then the speed drops significantly.

Inserting 10000 rows with batches of 50 take 0.9 seconds. Inserting 10000 rows with batches of 51 take 5.7 seconds.

My question has two parts:

  1. Why is there such a hard performance drop at 50?
  2. Can I rely on this behavior and code my application to never send batches larger than 50?

My tests were done in c++ and ADO.

Edit: It appears the drop off point is not 50 rows, but 1000 columns. I get similar results with 50 rows of 20 columns or 100 rows of 10 columns.

A: 

Did you also compare with the "union all" approach shown here? http://blog.sqlauthority.com/2007/06/08/sql-server-insert-multiple-records-using-one-insert-statement-use-of-union-all/

I suspect there's an internal cache/index that is used up to 50 rows (it's a nice round decimal number). After 50 rows it falls back on a less efficient general case insertion algorithm that can handle arbitrary amounts of inputs without using excessive memory.

Mark B
Just tried the UNION ALL approach. It is also quite slow. Around 7 seconds for 10000 rows.
Todd
A: 

the slowdown is probably the parsing of the string values: VALUES (5, 'dog'), (6, 'cat'), (3, 'fish) and not an INSERT issue.

try something like this, which will insert one row for each row returned by the query:

INSERT INTO YourTable1
    (col1, col2)
    SELECT
        Value1, Value2
        FROM YourTable2
        WHERE ...--rows will be more than 50

and see what happens

KM
A: 

If you are using SQL 2008, then you can use table value parameters and just do a single insert statement.

personally, I've never seen the slowdown at 50 inserts records even with regular batches. Regardless we moved to table value parameters which had a significant speed increase for us.

Chris Lively
I tried this, and as a SQL novice, I'm not certain I'm doing it in the most optimal way. However my tests show anywhere from 6 seconds to 30 seconds per 10000 rows, depending on batch size. Nothing close to the multi row INSERT statements.
Todd
That is very strange. Are you assigning all of the fields in the table or are some assigned through the column default values? If the later, these can get expensive (like the getdate() function)...
Chris Lively
I followed the example here: http://msdn.microsoft.com/en-us/library/bb510489%28SQL.100%29.aspxBut instead of inserting into the temporary table from another table, I insert values using a multi row insert. I specify all values, no defaults.
Todd
A: 

Random thoughts:

  • is it completely consistent when run repeatedly?
  • are you checking for duplicates in the 1st 10k rows for the 2nd 10k insert?
  • did you try batch size of 51 first?
  • did you empty the table between tests?
gbn
Yes it's very consistent. And I do empty the table between every test.
Todd
@Todd: interesting. Good question. And I've no further ideas though :-)
gbn
+1  A: 

It could also be related to the size of the row. The table you use as an example seems to have only 2 columns. What if it has 25 columns? Is the performance drop off also at 50 rows?

MJB
Interesting. The above table was just an example, my actual table has 20 columns in it. I cut down to 10 and tried again.Now I get the cut off at 100. Batches of 100 take about 0.6 seconds. Batches of 101 take about 3.25 seconds.It appears the limit is not 50 rows, but 1000 individual updated fields?
Todd
It sounds like the cutoff is more like total data in memory. If a table is small, then perhaps you can fit 100 rows into a "block", whatever a block is. But large tables, with many columns or with few large columns (e.g., VARCHAR(1200)) would limit the number of rows you could fit into a block.
MJB
A: 

For high-volume and high-frequency inserts, consider using Bulk Inserts to load your data. Not the simplest thing int he world to implement and it brings with it a new set of challenges, but it can be much faster than doing an INSERT.

John Dibling