views:

8369

answers:

10

Hi,

Currently I got this type of query generated by programmation (c#)

INSERT INTO TableName (Field1, Field2, Field3)
SELECT Field1, Field2, Field3 FROM TableName2

The problem is that the SELECT can have a result of many records (like a million), so it's take many times and the result is an connection timeout.

Also, if I separate all the insert into in single insert (for this example, one million insert queries), it's take very long time to execute ... but it work ...

Is there a way I can improve this type of query ?

I use MSSQl 2005

Thx

+6  A: 

Well, if it is a full copy, I wonder if you shouldn't look into bulk load tools?

  • BULK INSERT (TSQL)
  • SqlBulkCopy (.NET)
  • bcp (command line)
  • etc

If you had a Where clause, I'd check that it was suitably indexed...

Additionally:

  • perhaps drop indexes and triggers before doing the INSERT (recreate afterwards)
  • consider dropping the entire table and using SELECT INTO ? (see comments)
Marc Gravell
Don't do select into... It will lock system tables until the query finishes because the table is being created in the same transaction. It will net no performance gain anyways.
Brian Rudolph
@Brian - fair enough ;-p
Marc Gravell
+2  A: 

Set CommandTimeout property of the SqlCommand you're using to a sensible value (10 minutes or something). Remember that CommandTimeout is in seconds.

Anton Gogolev
+6  A: 

I've found out that, if you have a lot of INSERT statements that are executed in sequence, you can improve performance by adding a 'GO' statement after each xxxx number of insert statements:

...
INSERT INTO Table ( ... ) VALUES ( ... )
INSERT INTO Table ( ... ) VALUES ( ... )
INSERT INTO Table ( ... ) VALUES ( ... )
GO
INSERT INTO Table ( ... ) VALUES ( ... )
INSERT INTO Table ( ... ) VALUES ( ... )
...

Another possibility maybe, is to make sure that your INSERT INTO .. SELECT FROM query doesn't insert everything in one time, instead use some kind of paging technique:

INSERT INTO Table ...
SELECT ...
FROM OtherTable WHERE Id > x and Id < y
Frederik Gheysels
A: 

You don't indicate what problem your solving with this approach. Obviously a WHERE would narrow the record set. But if the result set isn't going to be modified in the New Table, then why replicate the data at all? Why not query directly from the Source?

SAMills
A: 

either bulk load by using a file and then bcp/BULK INSERT or batch it up in batches of 5K or so

SQLMenace
A: 

In the first place never try to insert a million records through C#. Never process large groups of records one at a time. This is work that should be done at the database by the database. USe bulk insert or SSIS or DTS to do this. And then schedule it as a job during off hours. If it still takes too long, then I suggest you run it in batches of several thousand (you will have to play with your own database to see what the best choice is as the number you can safely process depends greatly onthe tables, the indexing how fast your server is and how many users are also trying to do work against the same tables.

HLGEM
SqlBulkCopy from C# is fine...
Marc Gravell
And, he is using the database. The C# code just kicks off the query.
Joel Coehoorn
A: 

Another way that we have used in the past is to create a temp table with the primary keys we want to move and use while loop. This way you can do it in a kind of block fashion so you avoid the large transaction overhead if you canceled and it had to roll back.

Basically what you end up doing is a insert into tablename (...) select (...) from table name where primary key in (select top 10000 key from temptable)

the top 10000 you want in a secondary result set so you can remove them from the temp table so they don't get processed again.

Yet another way would be to use cursors to reduce the number of records you process at a time.

Another loop method would be to do something like this in a while loop.

declare @stop as int set @stop = (select count(primaryKey) from tableName where primaryKey not in destinstiontable)

while (@stop > 0) begin transaction

insert into destinationTable (...) select (...) from sourcetable where primaryKey not in (select primarykey from destinationtable)

commit

set @stop = (select count(primaryKey) from tableName where primaryKey not in destinstiontable) end

Not the most effecient but it would work and should allow you to keep the transaction log down. Unless you need it also make sure to use the no lock keyword so that you don't block other transactions when doing this large move (unless you use BCP or DTS as they are much much faster).

Some of what has been said is probably your best bet though. Use BCP, DTS, or some other bulk tool. If you can drop indexes, it will make things go a lot faster.

Joshua Cauble
Creating the temp table with the keys alone will peg tempdb and just make the IO issues worse.
Brian Rudolph
True that it does. I'd still perfer to use DTS or some other BCP tool in this situation. Trying to do it manually can just be painful and resource intensive.
Joshua Cauble
+1  A: 

Some good answers here.

Just like to add that if you have indexes on the destination table they will slow down the operation. However, rebuilding the index can sometimes take a long time if you do the drop create technique.

If you don't want to drop the indexes, use an ORDER BY in your SELECT that matches the destination clustered index, this seems to help (probably helps minimize the page splits).

Cade Roux
A: 

Ok, there are a few fundamental issues.

  1. I\O - Inserting into a table while reading from another table will most likely cause disk contention if the tables are not on separate disks. Put the opposing tables on to physically different spindles.

  2. Transaction Log - You need to ensure that your transaction log is on it's own disk, or work in smaller transactions (a few thousand rows at a time) or use BCP\Bulk Insert which is not logged.

  3. Clustered indexes - If you are inserting all of these rows into a target table, and it's clustered index(the physical order data is written to disk) is not writing sequentially, the disk IO requirements go through the roof because of page splits and re-allocation. An easy fix can be to create a clustered index on the recipient table that is a sequential seeded key. This will generally ensure that you get sequential writes into the table and almost always at the end.

  4. File Expansion - Ensure that you have SQL set to expand it's files at a decent rate, like 10% or so. Otherwise it will have to constantly resize it's files and zero out the disk. There are ways to prevent it from having to zero out the disk as well, like enabling the Bulk File Operation permission in your group policies for the Sql Service user.

Quite frankly, other than that and a few of the others suggestions, it is highly unlikely you will make an insert with millions of rows in a transaction be truly fast. If you did this via Bulk Insert, it would be drastically faster, though it might not be what you need from an application perspective.

Brian Rudolph
A: 

Have you tested the sql through sql server management studio to see how long it actually takes? I would start there. You may improve the performance of the select. And you might be able to improve the performance with tablock hints on the table your inserting into.

Will Rickards